Xeoncross
Xeoncross

Reputation: 57294

How to create, update and read a radix tree that won't fit into memory?

I’m interested in using a radix tree (or Patricia trie) to store a hash/dict/array of strings -> values. However, I'm finding that I have too many strings to fit into memory.

I found an article by Algolia about how they solved this problem with their search index and they talk about doing what I’m trying to do: flushing a radix tree to disk as each branch is constructed and only reading back the branches you need.

However, they don’t mention how they do this. The only way I can think of storing a radix tree is either as a full (serialized) object or as a hash/array as a simple Key/Value store.

For example, using a key/value store

SET smile:  [...values...]
SET smiled: [...values...]
SET smiles:  [...values...]
SET smiling: [...values...]

Then doing a prefix scan to pull out keys/values that MATCH smil*. However, this kind of loses the space-saving benefits of a radix tree plus it would require reconstructing at least part of the radix tree on load.

Upvotes: 6

Views: 522

Answers (1)

StarShine
StarShine

Reputation: 2060

Why not hash the strings in a pre-processing step, store the mapping out-of-core and build the trie on the hashes instead? This should significantly reduce the memory load since only the hashes are left to consider.

Upvotes: 0

Related Questions