Reputation: 7141
I have an array of strings String[] words
and a 28000 word Word-list.
I want to check if any member of the String array is in the WordList (the word-list is in a text file wordlist.txt)
What is the most efficient way to go about this?
Upvotes: 4
Views: 5277
Reputation: 27697
A HashSet will suffice if your list of words can fit in memory.
If memory size is a concern use a BloomFilter. While it is possible for bloom filter to give the wrong answer, you can tune the probability with which it happens.
Upvotes: 0
Reputation: 159844
Place the strings directly into a HashSet<String>
rather than an array and iterate through the file using contains
on the set to check the content. You wont improve on O(1) access. This will also mimimize memory used to store the Strings
should any duplicates exist.
Upvotes: 9
Reputation: 972
HashSet
's add()
returns false if the word is already present in the set.
for (String str : words) {
if (!wordSet.add(str)) {
System.out.println("The word " + str + " is already contained.");
}
}
This is a bit more sophisticated and less low-level than contains()
.
Upvotes: 0
Reputation: 109577
Store instead of the original words.txt a serialized HashSet. As a separate step from running the application.
The application then only needs to load the hash set once.
Upvotes: 0
Reputation: 21101
Create a HashSet
of Strings as
HashSet<String> wordSet = new HashSet<String>(Arrays.asList(words));
And Check for word
in HashSet
with HashSet.contains(Object o) method where word
is the word you want to check if exists.
Upvotes: 0
Reputation: 123
Step1:Don't use string array. Instead of use HashSet.
Step2: Load the file(that is wordlist.txt) contents into another HashSet
Step3:
Set<String> set1 = new HashSet<String>(); //Load the string array into set
Set<String> set2 = new HashSet<String>(); //load the file contents into set
for (String str : set1) {
for (String str2 : set2) {
if (str.equalsIgnoreCase(str2)) {
break;
}
}
}
Upvotes: 1
Reputation: 4184
You can try the array (tree) suffix algorithm, but you need to implement, look this:
Longest palindrome in a string using suffix tree
Upvotes: 2
Reputation: 10497
You can use HashSet<String>
or ArrayList<String>
which has contains
method. It will check if your String is stored or not.
The difference between HashSet
and ArrayList
is hashset won't allow duplicate value and it will not maintain the order while arraylist allows you duplicates and its an ordered collection.
But HashSet is more efficient than arraylist to perform a search operations.
Upvotes: 0