Satheesh Cheveri
Satheesh Cheveri

Reputation: 3679

Building data structure for Dictionary

I am looking for some high level thoughts/ideas help me building a data structure for Dictionary. I have a legacy 'product (medicine) search system' which is very slow and complex in nature. We would need to completely re-architect the system for efficient and maintainable solution.

To simplify the question, I take an example of 'Dictionary' ( I expect my new system behaves like Dictionary )

  1. I should be able to store Word,description and few synonyms ( equivalent generic medicine),
  2. Words should not duplicate
  3. Synonyms will also be instance of Word ( it should carry behaviour of word, description and synonyms).
  4. Faster searching

UseCases

  1. When a word is searched, its meaning and synonyms displayed
  2. Faster Search
  3. Removal of synonym should be possible
  4. Adding new word, should be able to add to any existing word's synonyms

I created a data structure shown below

Class Word {
    String meaning;
    List<Word> synonyms;
}

To store Words, I am thinking to use TreeSet

because

TreeSet provides an implementation of the Set interface that uses a tree for storage. Objects are stored in sorted, ascending order. Access and retrieval times are quite fast, which makes TreeSet an excellent choice when storing large amounts of sorted information that must be found quickly.

Or I can use HashMap, where the hashcode of word and synonyms word instance made equal which could enable faster retrieval.

Still I could see lot of challenges

  1. When ever new word is added how to link with its synonyms

  2. Look up would be slow when there are huge number of words

  3. Editing word also should reflect synonyms and vice versa

Any thoughts/inputs/tricks would be highly valued

Upvotes: 4

Views: 3433

Answers (2)

harsh
harsh

Reputation: 7692

For word search and word completion requirement Trie would be a fast alternative. Take a look at Java implementations:

In computer science, a trie, also called digital tree and sometimes radix tree or prefix tree (as they can be searched by prefixes), is an ordered tree data structure that is used to store a dynamic set or associative array where the keys are usually strings.

http://pathakalgo.blogspot.in/2012/11/trie-data-structure-implementation-in.html

https://www.google.co.in/search?q=Trie&client=ubuntu&channel=cs&oq=Trie&aqs=chrome..69i57j69i60l2.856j0j1&sourceid=chrome&ie=UTF-8

For synonym linkage, you can maintain a Map<String, LinkedList<String>>. Once a word is found using Trie, fetching associated sysnonyms would be O(1).

Upvotes: 2

You could use Trie to store all words in dictionary. Add a list of synonims for each word (node).

Upvotes: 2

Related Questions