Reputation: 2396
I've been researching B+ trees and T-trees lately. There seems to be a trend in which B+ trees are used for on-disk indexes and T-trees are used in memory.
I believe it is due to disk I/Os, but I can't find anything confirming that notion. Am I correct in that assumption?
Also, if disk accesses for T-trees could be limited to log B via caching, couldn't they outperform B+ trees at logB N?
Upvotes: 1
Views: 487
Reputation: 7698
The T-Tree is essentially a binary tree. So the tree depth is something like log2(N/B) for the T-Tree vs. logB(N) for the B+Tree (N=#data items, B=number of keys stored in each node=branch factor for B+Tree). These are approximate as neither tree has a fixed number of items in each node. Anyway, for large N, the B+Tree will win handily on that metric. Under the assumption of uniform memory access, that figure doesn't matter but it really matters on secondary storage where it is roughly the number of secondary storage accesses. It also matters on modern machines with hierarchical memory (the original T-Tree paper tested on a Vax 11/750).
I make assumptions above surrounding how you update both structures for the respective environments. I believe they are symmetric and fair. Primarily I assume that data and keys are stored by reference in memory and by copy on secondary storage. Failure to adjust the structures in that way would be disastrous for the T-tree, which has uniform access cost as the core of its design, as each key compare would require external access. For non-fixed size data, some other packing adjustments are needed in both cases (and used in the real world).
Upvotes: 1