6.3.8. Data Structures
Common data structures for storing data on-chain.
A totally ordered multiset is a multiset + a relation on the set that satisfies the conditions for a partial order + an additional condition known as the comparability condition. This data structure has a variety of helpful operators for iterating.
storage-tomulset-index constructs a multiset whose elements are totally ordered sets (by insertion order). It allows paginated retrieval of elements in a named set.
storage-toset-next-iters constructs an iterator across a set of tosets with arguments: whether to forward iterate, bookmark, first toset, and &rest of other tosets. Takes the following operations.
'enumerate returns a vector of 2 elements: a list of items and an empty bookmark. This can also take a filter.
'page takes a page size + filter and returns a vector of 2 elements: a list of items and a bookmark of the last item accessed. To access the next group of items, provide the bookmark to the
B+ trees are available for optimizing data storage. Storing key/value pairs in a B+ trees on the ledger is beneficial for two reasons. Firstly, loading multiple keys may be more efficient if locality of reference exists -- there is the possibility that a B+ tree may only need to load one leaf node to satisfy a number of sequential reads in the same node. Secondly, when a B+ tree is backed by a private data collection (PDC), it can support range queries where the PDC itself cannot. Naturally, B+ trees are not appropriate to all use cases. Some use cases may see decreased performance if B+ trees cause excessive unnecessary I/O, or additional MVCC conflicts due to false sharing.
Prior to the existence of the B+ tree implementation, the platform had supported tomulsets. After the addition of B+ trees, a compatibility layer was added that allows clients using the tomulset API to switch to B+ tree backing (a data migration may also be necessary). Each tomulset is emulated using two B+ trees, called
storage-tomulset-index takes a keyword parameter called
factory-toset-index. An argument can be supplied that is the result of invoking
storage-toset-index-bptree-factory. The latter function takes optional parameters
eat-hdv. Each of the two B+ trees,
hdk specifies the number of keys in interior nodes.
hdv specifies the number of key/value pairs in leaf nodes. B+ trees have no values in interior nodes.
Performance tuning: Setting
hdk lower will limit the branching factor of the tree and increase its depth, requiring more lookups to localize a particular key. Setting
hdk higher may cause delays related to the I/O required to load the large nodes from the ledger, and to search for keys within the interior node (which is still done by linear scan currently). The same consideration applies to setting
hdv higher. Setting
hdv lower will increase the number of nodes in the tree.
The choice of parameter doesn't depend much on the query size. The choice of parameters depends more on the average (or better yet, distribution) of the key length and value length as well as the access pattern. The goal is for interior nodes (which have only keys) as well as leaf nodes (which have keys and values) to generally fit into a chosen I/O size that strikes a balance between providing a high throughput on the one hand and not exceeding the diminishing returns of locality of reference on the other. So, in summary, the choice of parameter depends on the key and value sizes so it's best to try a few different values empirically with real-world data, benchmark them with a realistic or real-word access pattern, and choose the best results.
(export 'storage-toset-index-bptree-factory) (defun storage-toset-index-bptree-factory (&optional ixo-hdk ixo-hdv eat-hdk eat-hdv))
'put inserts a given item into the tomulset of a provided key.
'del deletes a given element from a provided key.
'has? returns true if a provided key is in the tomulset, false otherwise.
'count returns the number of items of a provided key.
'page pages forward the tomulset with a given filter, bookmark, and page size.
'page-back pages backward the tomulset with a given filter, bookmark, and page size.
'enumerate returns the entire list of items of a provided tomulset ket.
'dump evaluates to the DB key/value pairs that define the tomulset contents.
'free deletes the entire tomulset for the given key.
'next-iter creates an iterator for the given key, can be used with
'prev-iter creates an iterator that goes backwards for the given key, can be used with
'get-set makes a toset provided a key to the tomulset.