Martin
Martin

Reputation: 40583

How to store 50,000 English words so that it takes as little memory as possible

I have to store ~50,000 English words in memory and I'd like to know what would be the best data structure in term of memory footprint (and loading speed). Would it be a Trie? How would I serialize it into a file? Is there anything better than that?

Essentially, once the ~50,000 words are loaded into memory, I simply need to check if the word exists or not.

Upvotes: 1

Views: 1500

Answers (4)

seldary
seldary

Reputation: 6256

Well, according to your provided guidelines, a simple List would be better.

Fetching time would be obviously slower than a Trie or Dictionary, but

"in term of memory footprint (and loading speed)"

It will require very little memory overhead, and will load faster (as no indexes / prefix data structures are built).

See this blog post for some memory comparison details (In JavaScript, but still applies).

Upvotes: 1

mohsensajjadi
mohsensajjadi

Reputation: 335

A Dictionary object is proposed. Read these:

Most efficient in-memory data structure for read-only dictionary access

Why is Dictionary preferred over hashtable?

For Help on implementation read this:

http://msdn.microsoft.com/en-us/library/xfhwa508.aspx

For serializing the dictionary object or the hash table read this reference:

http://blogs.msdn.com/b/adam/archive/2010/09/10/how-to-serialize-a-dictionary-or-hashtable-in-c.aspx

Upvotes: 0

Joey
Joey

Reputation: 354586

Yes, a trie sounds ok for this. For serialising you'd have two options:

  1. Use the original word list and rebuild the trie. It should be fast enough, I guess, but you may want to profile it.
  2. Just use normal .NET serialisation for the type and dump it to a file. This prevents programs in other languages from reading it, though.

Upvotes: 0

npinti
npinti

Reputation: 52185

According to this answer, the Dictionary class is what you need. As per the MSDN documentation, you should use the TryGetValue method to access your data:

Use the TryGetValue method if your code frequently attempts to access keys that are not in the dictionary. Using this method is more efficient than catching the KeyNotFoundException thrown by the Item property.

Upvotes: 0

Related Questions