Skip to main content

R24 IPFS ORM

R24 IPFS ORM

During the course of I06 IPFS Rewrite a class system was constructed to easily map back and forth between data in our models and IPFS based data. This system appears generic enough that it can be factored out into a standalone module, stabilized, tested, and performance tuned, and then used to add on a new feature for the Interpulse core to inherit: Encryption at Rest.

This library should not strive to used inside of chainland. We may wish to supply an ORM designed specifically for modelling objects as chains in our system, but we should not encourage the usage of mappings between IPFS data blocks inside our chains directly. In the state of a chain, the correct way to hash some data and link to it, is to create a child chain, and so any ORM we produce would favour this method, not a direct IPFS link.

The primary benefit to us, the builders of the Dreamcatcher Blockchain, is for change management as the schema of the chain changes. Making this a separate library gives good confidence that the code is correct, puts a barrier in place to keep this funcitonality separate from all else, and enables the addition of Encryption without getting tangled in the core code.

The suggested scoping of the solution to this Request is to not consider any external utility for this ORM until we have used it internally ourselves, first.

Design​

This could be a nested design like the current implementation, where any class in the system can be reinstantiated using static methods on that class. It could also be a centralized design, where a central factory type method is called to do hydration and dehydration.

Developer Interface​

The utilization of this library must be via class inheritance. Any class wishing to have CID abilities must inherit from one of the supplied classes. The data that is to be serialized must be POJO own properties on the class instance.

Developers must be able to indicate if they want a particular class to be hashed out every time, some times, or none of the time. An example of this is to use the same class in different places, one punched out, one not.

Each object must be an immutable class, and all ORM methods must return a new instance of the class. Developers should be encouraged never to mutate their internal class state.

During hydration and dehydration, the developers class must have the opportunity to perform logic checks, to ensure the data their classes contain is logically correct, not just of the correct type.

Versioning should be handled somehow, but is not required as the use case is supposed to do its own version management.

IPFS interface​

The ORM should do its operations using two function calls only:

  1. await put( Block ) where Block is an IPLD Block, which contains a CID
  2. await get( CID )

The implementation of these functions is outside of the scope of the ORM, and for testing may be as simple as a Map(). Errors in calling these functions should bubble up. A possible extension is to use chains themselves to store and resolve the put and get function, since the backing store is independent of what the ORM is doing.

IPLD schema checking​

The ORM must allow the optional inclusion of schema for the classes similar to w006-schemas which may be supplied to the ORM using some kind of map interface, such as ipldSchemas.js

Devs must be able to specify if they want schema checking on at hydration, on at dehydration, as well as if they want the hydrated objects frozen.

Diffing​

Managing diffing is the implementors choice, but for the same of performance, the library should be aware of diffs between each object it is managing.

Caching​

An object cache should be maintained to make hydration requests fast. The size of this cache should be tracked based on the size of the hashed data that make up the objects, and memory pressure should cause ejection. This is complicated by the sharing of data between objects, but being naive and too agressive with ejection is better than the alternative.

Performance​

A set of benchmarking tests making different types of use of the library should be set up and run each commit so as to keep performance high. Being classes the overhead should be minimal.

HAMT​

A special type of object that must be supported is the Hash Array Mapped Trie as this object is the key to being able to have constant time additions of a new channel to a pulsechain, even with the number of channels is large.

This device adds the complexity that it might need to be rehydrated during system operations, rather than all at once at the start.

Lightweight Instantiation​

The HAMT structure cannot be fully instantiated during normal operations due to its potential size overloads. Hence the ORM needs to permit partial inflation of classes with optional targetted hydration at an arbitrary point in time.

The ORM may allow partial hydration in other parts of the application too. This hydration can be triggered explicitly or can be implicit by making some class functions async. Care must be taken when leaving a live ipfs resolver function available to hydrate in such on demand scenarios.

Hydration Permissions​

The use of this inflation must be carefully managed so that only CIDs that the current model is entitled to view are available. For example, if a hydration function was passed into userland, malicious code might ask it to inflate CIDs that it should have no access to. The implementation may solve this by specifying a root CID, and then only ever allowing

If such a scheme is implemented, then care should be taken to recognize special classes that indicate the end of automatic hydration, such as a PulseLink. This is to stop a malicious inflation from walking back through the pulsechain, when they are only permitted to view

The scheme may allow some classes to be marked as end of hydration, for example the State class might be marked as such, to stop malicious code making a CID to some forbidden blocks and then using hydration to access them.

Stopping Hydration​

Special objects like Address, Pulselink, and Binary are classes that represent direct IPFS concepts, rather than a layer above like normal ORM classes do. These signal a break in the hydration process, so that inflating one pulse does not inflate all the lineage too, nor does it inflat the binary of the chain, nor the pulses of all the channels that the pulse is connected to.

However the data is present in such a way as it can be fed back into the ORM somehow and then another segment of hydration can be worked upon, again stopping at the developer specified boundaries.

Encryption​

Must provide a pluggable encryption scheme, where hashes refer only to encrypted blocks of data, and that each block of data contains no plaintext information about what other blocks it links to. This is a feature not present in any iteration of our blockchain, and would imbue our systems with encryption at rest.

During encryption the system must still take advantage of diffing of objects.

Optional Dynamic Dehydration​

We may make the decision to crush some parts of the tree that any given model represents dynamically based on performance considerations. It is much faster to initially crush the entire pulse as one single POJO and then pick pieces out of it later. This is not a requirement for the ORM but might be implementable easily depending on other design decisions. It would be done solely to help performance speed.

Object types​

Plain object​

Deeply Nested ORM object​

Array of plain objects with deeply nested children​

Array of CID objects​