Eduardo
Eduardo

Reputation: 7141

Java: Most efficient way to check if a String is in a wordlist

I have an array of strings String[] words and a 28000 word Word-list.

I want to check if any member of the String array is in the WordList (the word-list is in a text file wordlist.txt)

What is the most efficient way to go about this?

Upvotes: 4

Views: 5277

Answers (8)

Andrejs
Andrejs

Reputation: 27697

A HashSet will suffice if your list of words can fit in memory.

If memory size is a concern use a BloomFilter. While it is possible for bloom filter to give the wrong answer, you can tune the probability with which it happens.

Upvotes: 0

Reimeus
Reimeus

Reputation: 159844

Place the strings directly into a HashSet<String> rather than an array and iterate through the file using contains on the set to check the content. You wont improve on O(1) access. This will also mimimize memory used to store the Strings should any duplicates exist.

Upvotes: 9

Andr&#225;s Hummer
Andr&#225;s Hummer

Reputation: 972

HashSet's add() returns false if the word is already present in the set.

for (String str : words) {
  if (!wordSet.add(str)) {
    System.out.println("The word " + str + " is already contained.");
  }
}

This is a bit more sophisticated and less low-level than contains().

Upvotes: 0

Joop Eggen
Joop Eggen

Reputation: 109577

Store instead of the original words.txt a serialized HashSet. As a separate step from running the application.

The application then only needs to load the hash set once.

Upvotes: 0

TheKojuEffect
TheKojuEffect

Reputation: 21101

Create a HashSet of Strings as

HashSet<String> wordSet = new HashSet<String>(Arrays.asList(words));

And Check for word in HashSet with HashSet.contains(Object o) method where word is the word you want to check if exists.

Upvotes: 0

SmartTechie
SmartTechie

Reputation: 123

Step1:Don't use string array. Instead of use HashSet.

Step2: Load the file(that is wordlist.txt) contents into another HashSet

Step3:

Set<String> set1 = new HashSet<String>(); //Load the string array into set
    Set<String> set2 = new HashSet<String>(); //load the file contents into set
    for (String str : set1) {
        for (String str2 : set2) {
            if (str.equalsIgnoreCase(str2)) {
                break;
            }
        }
    }

Upvotes: 1

Omar Mainegra
Omar Mainegra

Reputation: 4184

You can try the array (tree) suffix algorithm, but you need to implement, look this:

Longest palindrome in a string using suffix tree

Upvotes: 2

Vimal Bera
Vimal Bera

Reputation: 10497

You can use HashSet<String> or ArrayList<String> which has contains method. It will check if your String is stored or not.
The difference between HashSet and ArrayList is hashset won't allow duplicate value and it will not maintain the order while arraylist allows you duplicates and its an ordered collection. But HashSet is more efficient than arraylist to perform a search operations.

Upvotes: 0

Related Questions