cforfun
cforfun

Reputation: 135

compare B+tree implementation: storing internal nodes on disk

is there any implementation where internal nodes of B+tree is also stored on disk? I am just wondering if any one is aware of such an implementation or see real advantage doing it this way? Normally, one stores the leaf nodes on disk and develop the B+ tree as per need.

But it is also possible to save the current state of B+tree's internal nodes (by replacing the pointers by disk block number it points to): I see there are other challenges like keeping the internal nodes in memory in sync with the disk blocks: but the B+ tree may be implemented on nvram or say battery backed dram or some other method to keep it in sync.

Just wondering if anyone has already implemented it this way like linux's bcache or another implementation?

cheers, cforfun!

Upvotes: 1

Views: 409

Answers (1)

DarthGizka
DarthGizka

Reputation: 4665

All persistent B+Tree implementations I've ever seen - as opposed to pure 'transient' in-memory structures - store both node types on disk.

Not doing so would require scanning the all the data (the external nodes, a.k.a. 'sequence set') on every load in order to rebuild the index, something that is feasible only when you're dealing with piddling small amounts of data or very special circumstances.

I've seen single-user implementations that sync the disk image only when the page manager ejects a dirty page and on program shutdown, which has the effect that often-used internal nodes - which are rarely replaced/ejected - can go without sync-to-disk for a long time. This is somewhat justified by the fact that internal ('index') nodes can be rebuilt after a crash, so that only the external ('data') nodes need the full fault-tolerant persistence treatment. The advantage of such schemes is that they eliminate the wasted writes for nodes close to the root whose update frequency is fairly high. Think SSDs, for example.

One way of increasing disk efficiency for persisted in-memory structures is to persist only the log to disk, and to rebuild the whole tree from the log on each restart. One very successful Java package uses this approach to great advantage.

Upvotes: 1

Related Questions