Neir0
Neir0

Reputation: 13367

Memory efficient way to store strings

Assume I have millions strings. Each string has an int value. I want to retrieve this value by input string but I don't want to store all this strings because they take a lot of space. I can't use hash table because of it need to store all or at least many strings in memory. So what is good data structure for my case (I don't need to add or delete any strings, I am already have prepared data and read is only allowed operation)

Upvotes: 6

Views: 3125

Answers (4)

Paul Rubel
Paul Rubel

Reputation: 27222

If you can can pre-process the word list take a look at perfect hashes, like CMPH. ( gperf is another, but seems optimized for smaller data sets. )

From the CMPH docs:

A perfect hash function maps a static set of n keys into a set of m integer numbers without collisions, where m is greater than or equal to n. If m is equal to n, the function is called minimal.

...

The CMPH Library encapsulates the newest and more efficient algorithms in an easy-to-use, production-quality, fast API. The library was designed to work with big entries that cannot fit in the main memory. It has been used successfully for constructing minimal perfect hash functions for sets with more than 100 million of keys, ...

Upvotes: 3

comingstorm
comingstorm

Reputation: 26117

You may want to look at the Judy tree, which is designed to be both fast and compact, and has a version designed for string keys. Its implementation is available on sourceforge.

Upvotes: 1

Fred Foo
Fred Foo

Reputation: 363617

Use a trie to prevent storing common substrings..

Upvotes: 4

Randy Howard
Randy Howard

Reputation: 2156

Your reason to not use a hash table doesn't sound valid based on the limited information in your question currently. It's fairly efficient if implemented well. It can also have the advantage of not wasting memory storing duplicate strings if that is acceptable for your needs, further reducing memory consumption if duplicate strings are possible.

You could conceivably also store a compressed form of each string in the hash table if you were creative about how you do lookups. How long are the strings typically?

Upvotes: 0

Related Questions