B Tree with varying maximum keys?

Question

I am wanting to implement something along the lines of a btree to index some data using variable length keys, where I am expecting that each node in the tree will look something like this,

struct key_block {
  block_ptr parent; // link back up the tree to the parent
  unsigned numkeys; // number of keys currently used by this block
  struct {
    block_ptr child; // points to the child immediately preceeding this key.
    struct {
      unsigned length; // how long this key is
      unsigned offset; // where the data for this key is
    } key; // support for variable length keys
    data_ptr content; // ptr to the data indexed by this key
  } entries[]; // as many entries as will fit on a disk block. 
}; // the last entry will be followed by another block_ptr which is the right hand child of the last node.

What I am intending to is store the actual key data in the same disk block as the node itself, positioned just after the final key and its right hand child that was within the node. The offset and length fields in each key will indicate how far from the start of the current block the actual data for each key begins and how long it runs for.

However, I am wanting to use a fixed size of disk block for my storage, and since I am wanting to store the variable length keys inside of the same block as the node, that means that the maximum number of keys that can be in a node depends on the length of the keys in that node. This sort of contradicts my understanding of the way a btree generally works, where all nodes have a fixed maximum number of entries, and I am not sure whether I can really implement this using a btree at all because I am violating that typical invariant.

So should I even be looking at using a btree structure? If not, what other alternatives exist for very fast external searching, insertion, and deletion? In particular, a key criteria for any solution must be that it is highly scalable to supporting very VERY large numbers of entries, and still be efficient for searching, insertion, and deletion (and btrees perform adequately on this front).

If I can still use a btree, how would the algorithm be affected when I no longer have an invariant maximum number of keys, but instead the maximum depends on the content of each individual node itself?

B Tree with varying maximum keys?

Answers (1)

Related Questions