Ashwin Surana
Ashwin Surana

Reputation: 876

how to search for a given word from a huge database?

What's the most efficient method to search for a word from a dictionary database. I searched for the answer and people had suggested to use trie data structure. But the strategy for creating the tree for a huge amount of words would be to load the primary memory. I am trying to make an android app which involves this implementation for my data structure project. So could anyone tell me how do the dictionary work.

Even when I use the t9 dictionary in my phone, the suggestions for words appear very quickly on the screen. Curious to know the algorithm and the design behind it.

Upvotes: 10

Views: 2383

Answers (3)

Ashwin Surana
Ashwin Surana

Reputation: 876

Using a trie is indeed space conscious, just realized when I checked my RAM usage after loading 150,000 words in to trie, the usage was 150 MB (Trie was implemented in C++).The memory consumption was hugely due to pointers. I ended up with ternary tries which had very less memory wastage around 30 MB (compared to 150 MB) but the time complexity had increased a bit. Another option is to use "Left child Right sibling " in which there is very less wastage of memory but time complexity is more than that of ternary trie.

Upvotes: -1

Saeed Amiri
Saeed Amiri

Reputation: 22555

You can use Trie which is most usefull for searching big dictionaries. Because too many words are using similar startup, trie brgins around constant factor search also you can use in place, with limited number of access to physical memory. You can find lots of implementation in the web.

If someone is not familiar with trie, I think this site is good and I'm just quoting their sample here:

A trie (from retrieval), is a multi-way tree structure useful for storing strings over an alphabet. It has been used to store large dictionaries of English (say) words in spelling-checking programs and in natural-language "understanding" programs. Given the data:

an, ant, all, allot, alloy, aloe, are, ate, be 

the corresponding trie would be: Sample Trie for above words

This is good practical Trie implementation in java: http://code.google.com/p/google-collections/issues/detail?id=5

Upvotes: 8

Ibolit
Ibolit

Reputation: 9720

There are lots of ways to do that. The one that I used some time ago (which is especially good if you don't make changes to your dictionary) is to create a prefix index.

That is, you sort your entries lexicologicaly. Then, you save the (end) positions of the ranges for different first letters. That is, if your entries have indexes from 1 to 1000, and words "aardvark -- azerbaijan" take the range from 1 to 200, you make an entry in a separate table "a | 200", then you do the same for first and second letters. Then, if you need to find a particular word, you greatly reduce the search scope. In my case, the index on first two letters was quite sufficient.

Again, this method requires you to use a DB, like SQLite, which I think is present on the Android.

Upvotes: 0

Related Questions