michael
michael

Reputation: 660

Longest Prefix Lookup from Disk

I currently have some implementation which performs Longest Prefix Lookup (LPM) from a radix tree data structure. This data structure contains IP prefixes (radix tree with max depth of 128 bits), and it is currently fully maintained in memory. The data is a read-only data and is never modified.

Unfortunately the data got bigger and I am no longer able to maintain it in memory. I am looking for an efficient data structure which can be held on disk and will provide efficient lookups. I am planing to use some cacheing mechanism on top of it.

Upvotes: 2

Views: 357

Answers (1)

Andriy Berestovskyy
Andriy Berestovskyy

Reputation: 8544

efficient data structure which can be held on disk and will provide efficient lookups

Looks like an XY Problem to me:

  1. We are talking about IPv6 (128 bits).
  2. We are talking about IPv6 routing (Longest Prefix Match).
  3. For 10 Gbit/s link there might be up to 14,8 million route lookups per second.
  4. According to public data, there are around 50K IPv6 prefixes at the moment.

If those assumes above are correct, we need to store 50K times 128 bit = 800KB in a radix tree. It should fit into the memory easily.

We need to lookup this data millions times per second. Placing the data onto a disk is an order of magnitude slowdown.

My guess is that your question is not your root issue, so please:

  1. Include information about a broader picture along with any attempted solution.
  2. If there are other solutions you've already ruled out, share why you've ruled them out. This gives more information about your requirements.

UPDATE:

So it is 2M IPv6 lookups per hour over a read-only 20M set of IPv6 prefixes. We still don't know what are those 20M prefixes, since from the practical point of view there are no that many prefixes in whole Internet. But anyway.

  1. 2M lookup/hour makes ~550 lookups/sec. According to Wikipedia, traditional (mechanical) hard drives cannot sustain such a load. So we need an SSD drive to store the database.

  2. According to Wikipedia, B-trees are good structures for external search.

  3. According to Wikipedia, Memory-mapped files use small amounts of RAM even for a very large file.

Putting all bits together. Those 20M IPv6 prefixes should be organized as a B-tree with node size PAGE_SZIE (usually 4KB) and mapped into the virtual address space of the process. The cache will be managed by operating system, so no other caching mechanism is needed.

Upvotes: 1

Related Questions