Reputation: 2826
I am wanting to implement something along the lines of a btree to index some data using variable length keys, where I am expecting that each node in the tree will look something like this,
struct key_block {
block_ptr parent; // link back up the tree to the parent
unsigned numkeys; // number of keys currently used by this block
struct {
block_ptr child; // points to the child immediately preceeding this key.
struct {
unsigned length; // how long this key is
unsigned offset; // where the data for this key is
} key; // support for variable length keys
data_ptr content; // ptr to the data indexed by this key
} entries[]; // as many entries as will fit on a disk block.
}; // the last entry will be followed by another block_ptr which is the right hand child of the last node.
What I am intending to is store the actual key data in the same disk block as the node itself, positioned just after the final key and its right hand child that was within the node. The offset and length fields in each key will indicate how far from the start of the current block the actual data for each key begins and how long it runs for.
However, I am wanting to use a fixed size of disk block for my storage, and since I am wanting to store the variable length keys inside of the same block as the node, that means that the maximum number of keys that can be in a node depends on the length of the keys in that node. This sort of contradicts my understanding of the way a btree generally works, where all nodes have a fixed maximum number of entries, and I am not sure whether I can really implement this using a btree at all because I am violating that typical invariant.
So should I even be looking at using a btree structure? If not, what other alternatives exist for very fast external searching, insertion, and deletion? In particular, a key criteria for any solution must be that it is highly scalable to supporting very VERY large numbers of entries, and still be efficient for searching, insertion, and deletion (and btrees perform adequately on this front).
If I can still use a btree, how would the algorithm be affected when I no longer have an invariant maximum number of keys, but instead the maximum depends on the content of each individual node itself?
Upvotes: 0
Views: 467
Reputation: 10863
There is no fundamental issue with a variable number of maximum keys in a B-tree. However, a B-tree does depend on some minimum and maximum number of keys in each node. If you have a fixed number of keys per node, then this is easy (usually N/2 to N nodes). Because you allow a variable number, you will need to determine a heuristic for balancing the tree. The better the heuristic, the more optimal the performance.
Fortunately, the issue will merely be performance. The shape of the B-tree has several invariants, but none of them are affected by your variable number of keys, so you will still be able to search. It just might be a poorly-balancing structure if you choose a bad heuristic.
Upvotes: 3