Reputation: 57294
I’m interested in using a radix tree (or Patricia trie) to store a hash/dict/array of strings -> values
. However, I'm finding that I have too many strings to fit into memory.
I found an article by Algolia about how they solved this problem with their search index and they talk about doing what I’m trying to do: flushing a radix tree to disk as each branch is constructed and only reading back the branches you need.
However, they don’t mention how they do this. The only way I can think of storing a radix tree is either as a full (serialized) object or as a hash/array as a simple Key/Value store.
For example, using a key/value store
SET smile: [...values...]
SET smiled: [...values...]
SET smiles: [...values...]
SET smiling: [...values...]
Then doing a prefix scan to pull out keys/values that MATCH smil*
. However, this kind of loses the space-saving benefits of a radix tree plus it would require reconstructing at least part of the radix tree on load.
Upvotes: 6
Views: 522
Reputation: 2060
Why not hash the strings in a pre-processing step, store the mapping out-of-core and build the trie on the hashes instead? This should significantly reduce the memory load since only the hashes are left to consider.
Upvotes: 0