Skip to main content

I21 IPORM the IPFS ORM

I21 IPORM

Background​

This document provides an idea in response to R24 IPFS ORM. We seek to define an MVP spec that serves the requests set out there.

Certain of the requests might not be serviceable in earlier passes of the work, in which case further intentions and directions will be discussed at the end of this document.

Scope​

For the purposes of this iteration, we take the following request items as critical to the resulting deliverable being of any value.

  1. Developer interface: providing an interface for managing CID's and managing iterations on the relationships between the data they represent, whilst hiding the details of this management.
  2. IPFS interface: using an abstracted interface for the underlying machinery so that a mock can be used to test correctness in a unit test.
  3. IPLD schema checking: Optionally associating each instance with an IPLD schema, and when provided, the instance should have tooling to check data against that schema.
  4. Encryption: Encrypting items stored into IPFS before hashing them.
  5. Schema change management: Suppressing explicit object relationship management allows developers to focus on data structure more and data management lifecycle less, which should result in easier schema changes to the core code.
  6. Upstream change management: Decoupling the IPFS libraries from their consumption to allow them to change freely without requiring the consuming codebase to change

The rest of the requests will be handled or designed for based on their ease of implementation.

Motivation​

There are a number of reasons to explode data into components for standalone storage on IPFS:

  1. Different parties are interested in different parts of the data
  2. Different parts of the data have different churn rates and resource requirements
  3. Data structures are big enough that they need to be managed by reference
  4. Resiliency, ie getting data to replicate as quickly as possible
  5. Data type reuse in multiple contexts

Broadly, these reasons are similar to the reasons you might choose to add a new foreign key to a database, so surveying prior art should help understand design decisions.

ORMs typically have two layers:

  1. The modeling language, which allows a developer to
    1. Specify the shape of the data the model is handling
    2. Specify a schema that can check incoming or outgoing data for adherence to expected shape (can be folded in to (1) if the ability to tag attributes is sufficiently well developed)
    3. Specify any HAS A relationships to other data items that are also specified using the ORM
    4. Declare any functions or methods meant to process the data the class handles
  2. The processor/client, which allows a developer to
    1. Instance objects from a serialized format, including any nested objects, providing hydrated versions
    2. Serialize objects from live instances, including any nested objects
    3. Check during serialization/deserialization for adherence to a schema (sometimes automatic)
    4. Provide lifecycle hooks during serialization/deserialization

Examples​

Prisma user schema (model layer)​

model User {
id Int @id @default(autoincrement())
email String @unique
name String?
role Role @default(USER)
posts Post[]
profile Profile?
}

Key notes:

  1. Every property has a declared type
  2. Some properties have types that are defined elsewhere in the ORM models
  3. Properties have standard symbols that equip the processor with extra behaviour such as β€œ?” for optional and β€œ[]” for many
  4. There’s an additional attribute language so that properties can be equipped with even more user defined behaviour during processing
  5. The model is parsed from the schema, which means that schema checking can be automatic and constant
  6. This approach requires crafting an additional parser
  7. There is no way in this schema language to declare methods that should come with the live object instances

Prisma client (processor layer)​

const user = await prisma.user.findUnique({
where: {
email: 'emma@prisma.io',
},
select: {
email: true,
posts: {
select: {
likes: true,
},
},
},
})

const deletePosts = prisma.post.deleteMany({
where: {
authorId: 7,
},
})

Key Notes:

  1. Prisma only provides interface level functionality on the client library/processor, eg CRUD
  2. Any extra verbs you would wish to add to types (eg β€œtagPosts(tag)”) would need to exist in a higher level of the program that loads and writes objects instead of just being on the newly hydrated objects
  3. There’s still quite a heavy syntax load inside the CRUD function arguments that don’t entirely escape from thinking like an RDBMS
  4. Relations can also be queried from root nodes in the tree

Objection User Model​

class Person extends Model {
static get tableName() {
return 'persons'
}

static get idColumn() {
return 'id'
}

fullName() {
return this.firstName + ' ' + this.lastName
}

static get jsonSchema() {
return {
type: 'object',
required: ['firstName', 'lastName'],

properties: {
id: { type: 'integer' },
parentId: { type: ['integer', 'null'] },
firstName: { type: 'string', maxLength: 255 },
lastName: { type: 'string', maxLength: 255 },
age: { type: 'number' },
address: {
type: 'object',
properties: {
street: { type: 'string' },
city: { type: 'string' },
zipCode: { type: 'string' },
},
},
},
}
}

static get relationMappings() {
const Animal = require('./Animal')
const Movie = require('./Movie')

return {
pets: {
relation: Model.HasManyRelation,
modelClass: Animal,
join: {
from: 'persons.id',
to: 'animals.ownerId',
},
},

movies: {
relation: Model.ManyToManyRelation,
modelClass: Movie,
join: {
from: 'persons.id',
through: {
from: 'persons_movies.personId',
to: 'persons_movies.movieId',
},
to: 'movies.id',
},
},
}
}
}

Key notes:

  1. Although much less terse, this model contains many of the same elements: schema checking, automatic de/serialization of children, decorating of certain properties with extra attributes (eg the relation property on relationMapping keys)
  2. In fact, this class definition could easily be a compile target for the prisma schema language
  3. Defining things in terms of class inheritance gives the developer the opportunity to add extra methods to the hydrated object
  4. Harder to learn

Objection queries (processing layer)​

const people = await Person.query().withGraphFetched({
pets: true,
children: {
pets: true,
children: true,
},
})

for (const person of people) {
console.log(person.fullName())
}

Key notes:

  1. Has instance methods on the results objects
  2. Can query for a whole graph at once
  3. Treats objects as transient and composable by forming them with a query, which will give a different answer at a different point in time
  4. Does not have a 1:1 mapping between hydrated and dehydrated items, ie: a hydrated query result cannot be dehydrated

Proposal​

Principles​

With the background properly in place, we are ready to propose an intended interface. We adhere to the following rules in this proposal

  1. Use built in language features for MVP efforts: this strongly favours a modeling language like objection, which can eventually be a compile target for something fancier like graphql or prisma
  2. Keep it simple: Don’t try to mix schemas and properties and relations into the some unwieldy format
  3. Colocate everything that is known statically about a model: Make it very easy to understand all the facts about a class you are looking at without jumping to a bunch of files
  4. Provide an extensible annotation system for properties: provision for the ability to add tags/functions to properties whose code is managed orthogonally. This will make it easier to add cross cutting behaviours in the future.
  5. Assume the underlying layer is injected at runtime: so we can always test quickly with stubs
  6. Assume the basic use case is crushing and uncrushing data from a root object: the main thing this needs to do properly is move things on and off the wire

API Example​

/** @jsdoc
Here is where documentation about the Dmz would go, so that jsdoc can automatically
generate documentation for the models.
**/
class Dmz extends Model {
fullName() {
return this.firstName + ' ' + this.lastName
}
static get ipldSchema() {
return `
type Dmz struct {
config &Config
timestamp Timestamp # changes every block
network Network # block implies net change
state &State
pending optional &Pending
approot optional PulseLink # The latest known approot
binary optional Binary
covenant Covenant
}
`
}
postUncrush() {
// lifecycle hook
assert(this.fullName().length < 36)
}
preCrush() {
delete this.transientVar
}
}

const iporm = await IPORM.open(getFn, putFn, classes, {
checkSchema: false,
})
await iporm.uncrush.DMZ(cid, decrypter)
await iporm.crush(dmzInstance, encrypter)

Hints​

  1. Timestamp class needs dependency injection
  2. No need for partial graph hydration
  3. State can’t have any nested keys crushed
  4. Resolver can throw, but must eventually backed by a stateful object
  5. Binary can’t be hydrated
  6. Keys can be changed on the fly
  7. Keys need to be stored on the pulsechain
  8. Changing keys is out of scope due to problems knowing what key to use for a historical Block referred to by the current Pulse

Tests​

This Idea is considered implemented once these tests pass:

Output is a standlone npm package​

The folder for this module is in dreamcatcher-stack/pkg/iporm and is the only output. This should be published to npm and consumable as such.

Replace existing IPLD schema specification​

Currently the IPLD specification for Interpulse is in subfolder w006-schemas. This folder should be able to be deleted once the Interpulse codebase is shifted to using the iporm package, with no loss of fidelity in the descriptional information currently present.

Replace existing IPLD ORM​

Currently the ORM used by Interpulse is in subfolder w008-ipld. This folder should be able to provide the same functionality it currently does, but by depending entirely on iporm for its IPFS ORM functions instead of its current adhoc ORM.

All 6 scope elements are met​

The scope defines 6 crucial elements that must be judged to have been met by the implementation in a form at least free of any critical to function bugs or deficiencies.