Reputation: 660
I currently have some implementation which performs Longest Prefix Lookup (LPM) from a radix tree data structure. This data structure contains IP prefixes (radix tree with max depth of 128 bits), and it is currently fully maintained in memory. The data is a read-only data and is never modified.
Unfortunately the data got bigger and I am no longer able to maintain it in memory. I am looking for an efficient data structure which can be held on disk and will provide efficient lookups. I am planing to use some cacheing mechanism on top of it.
Upvotes: 2
Views: 357
Reputation: 8544
efficient data structure which can be held on disk and will provide efficient lookups
Looks like an XY Problem to me:
If those assumes above are correct, we need to store 50K times 128 bit = 800KB in a radix tree. It should fit into the memory easily.
We need to lookup this data millions times per second. Placing the data onto a disk is an order of magnitude slowdown.
My guess is that your question is not your root issue, so please:
So it is 2M IPv6 lookups per hour over a read-only 20M set of IPv6 prefixes. We still don't know what are those 20M prefixes, since from the practical point of view there are no that many prefixes in whole Internet. But anyway.
2M lookup/hour makes ~550 lookups/sec. According to Wikipedia, traditional (mechanical) hard drives cannot sustain such a load. So we need an SSD drive to store the database.
According to Wikipedia, B-trees are good structures for external search.
According to Wikipedia, Memory-mapped files use small amounts of RAM even for a very large file.
Putting all bits together. Those 20M IPv6 prefixes should be organized as a B-tree with node size PAGE_SZIE
(usually 4KB) and mapped into the virtual address space of the process. The cache will be managed by operating system, so no other caching mechanism is needed.
Upvotes: 1