Reputation: 350
I have a list of words with a corresponding score for each word. And I'm combing through and matching each individual word in a big block of text in a .txt file with words in the list of words. The .txt file can have up to 10,000 lines of text.
When I first did this, I used a very brute force and naive method to match the words on my word list with my .txt file. Although I used a hash map, I did not use the hash map correctly, and might as well have used it as a list. So the code was written in the following manner:
for(int i=0; i<words.length; i++){
for(int j=0; j<wordListType.size(); j++){
Map<String, Integer> hmap = wordListType.get(j).getMap();
for(Map.Entry<String, Integer> entry : hmap.entrySet()){
if(words[i].contains(entry.getKey())){
foo();
}
}
}
}
words is a String[] with individual words from the text file. wordListType is an ArrayList of a class that contains a hash map of the keywords that I'm searching for. It's an ArrayList because there are multiple types of word lists. And the getMap() is my own helper method inside the WordList class.
Afterwards, I figured out my code was inefficient, and I was not using my hash map to its full strength. So I changed the code to the following:
for(int i=0; i<words.length; i++){
for(int j=0; j<wordListType.size(); j++){
Map<String, Integer> hmap = wordListType.get(j).getMap();
Integer val = null;
if((val = hmap.get(words[i])) != null){
foo();
}
}
}
This way I'm not going through each and every key in hmap like I do with the first method, and I use the O(1) HashMap.get() method instead.
However, the second, efficient method is not producing the results that I want.
I'm not quite sure why the words are being matched differently though. From what I can see, they should both provide the exact same answers, except my latter code should do so much faster. Instead the first method of iterating through all the keys of the hash map is actually producing the results that I want (and I checked this manually), while the second method does not.
There are NO null values in my hash map, which I have tested for. I've looked up the implementation of hash map, so I don't quite understand why this is not working. Am I missing something here or is there something else that is unrelated that is affecting my results? Any help is much appreciated.
Upvotes: 0
Views: 215
Reputation: 2504
The two if conditions you use do not test the same thing. Lets take an example where words[i]
is "tested" and your map contains the key "test" :
if(words[i].contains(entry.getKey())) {
This condition checks if your Word[i] contains your map key, meaning that the if block will be evaluated.
if((val = hmap.get(words[k])) != null){
This condition checks if your map contains the string word[k] ("tested"), which will evaluate to false as it contains only "test".
I believe that for your use case, the second implentation gives the result you are looking for.
Upvotes: 1
Reputation: 46953
If I get it correctly words[i]
is a String
. In the first solution you are matching every word that has the map key as substring. The second case you do exact matching.
This one:
words[i].contains(entry.getKey())
Will match every word that has entry.getKey()
a ssubstring. I.e it will match alabala
for the string ala
Over here:
(val = hmap.get(words[k])) != null
Which is better written:
hmap.contains(words[k])
You check if the map contains key that exactly matches the given word. In this case ala
will not match for the word alabala
.
Upvotes: 1