Bin Zhu
Bin Zhu

Reputation: 211

Which data structure should I use?

I want to store some words and their occurrence times in a website, and I don't know which structure I should use.

Every time I add a word in the structure, it first checks if the word already exists, if yes, the occurrence times plus one, if not, add the word into the structure. Thus I can find an element very fast by using this structure. I guess I should use a hashtable or hashmap, right?

And I also want to get a sorted list, thus the structure can be ranked in a short time.

Forgot to mention, I am using Java to write it.

Thanks guys! :)

Upvotes: 0

Views: 157

Answers (5)

Juned Ahsan
Juned Ahsan

Reputation: 68715

Define a Hashmap with word as the key and counter as the value

Map<String,Integer> wordsCountMap = new HashMap<String,Integer>();

Then add the logic like this:

  • When you get a word, check for it in the map using containsKey method
  • If key(word) is found, fetch the value using get and increment the value
  • If key(word) is not found, add the value using thw word as key and put with count 1 as value

Upvotes: 1

veritas
veritas

Reputation: 2444

Any Map Implementation Will Do. If Localized Changes prefer HashMap otherWise ConcurrentHashMap for multithreading.

Remember to use any stemming Library. stemming library in java for example working and work logically are same word.

Remember Integer is immutable see example below Example :

Map<String, Integer> occurrence = new ConcurrentHashMap<String, Integer>();

synchronized void addWord(String word) { // may need to synchronize this method
    String stemmedWord = stem(word);
    Integer count = occurrence.get(stemmedWord)
    if(count == null) {
      count = new Integer(0);
    }
    count ++; 
    occurrence.put(stemmedWord, count); 
   **// the above is necessary as Integer is immutable**

}

Upvotes: 0

Alexander Makarov
Alexander Makarov

Reputation: 886

So, you could use HashMap, but don't forget about multythreading. Is this data structure could be accessed throught few thread? Also, you could use three map in a case that data have some hirarchy (e.g. in a case of rakning and sort it by time). Also, you could look throught google guava collections, probably, they will be more sutabile for you.

Upvotes: 0

Steve P.
Steve P.

Reputation: 14709

A HashMap seems like it would suit you well. If you need a thread-safe option, then go with ConcurrentHashMap.

For example:

Map<String, Integer> wordOccurenceMap = new HashMap<>();

"TreeMap provides guaranteed O(log n) lookup time (and insertion etc), whereas HashMap provides O(1) lookup time if the hash code disperses keys appropriately. Unless you need the entries to be sorted, I'd stick with HashMap." -part of Jon Skeet's answer in TreeMap or HashMap.

Upvotes: 2

RaceBase
RaceBase

Reputation: 18868

TreeMap is the better solution, if you want both Sorting functionality and counting words. Custom Trie can make more efficient but it's not required unless you are modifying the words.

Upvotes: 1

Related Questions