Thuc Hung
Thuc Hung

Reputation: 130

Searching for a set of Strings contain a particular string from ArrayList in Java

Is there any fast algorithm to search in an Arraylist of String for a particular string?

For example :

I have an Arraylist :

{"white house","yellow house","black door","house in heaven","wife"}

And want to search strings contain "house". It should return {"white house","yellow house","house in heaven"} but in a minimum time. I mean my problem is to deal with big data (a list of about 167000 strings) without index.

Thanks!

Upvotes: 1

Views: 1473

Answers (2)

Weibo Li
Weibo Li

Reputation: 3605

I agree with Josh Engelsma. Iterate the list and check one by one is the most simple way. And 167000 is really not a quite big data, unless each String in the List is quite long. Liner search algorithm can be finished in only a few seconds in normal PC.

Consider the coding conventions, the code may be like this:

for(String s : list) {
    if(s.contains.("house")) {
        //do sth.
    }
}

If search will be performed many times on the same list with different keywords, you can build a reverse index to speed up searching.

In your example:

{"white house","yellow house","black door","house in heaven","wife"}

You could pre-process the list, separate each sentence into words, and build an index like:

"house" --> {0,1,3}
"white" --> {0}
"yellow" --> {1}
...

which means "house" is contained in the 0,1 and 3 -th elements of the list, and so on. The index can be implemented with HashMap:

Map<String, LinkedList<Integer>> = new HashMap<String, LinkedList<Integer>>();

And the search operation will be speedup to O(1) complexity ideally.

Upvotes: 1

Sergey Kalinichenko
Sergey Kalinichenko

Reputation: 726479

There are two answers to your question, depending on whether you are planning to run multiple queries or not:

  • If you need to run the query only once, you are out of luck: you must search the entire array from the beginning to the end.
  • If you need to run a significant number of queries, you can reduce the amount of work by building an index.

Make a data structure Map<String,List<String>>, go through the strings in your List<String>, and split them into words. For each word on the list of tokens, add the original string to the corresponding list.

This operation runs in O(N*W), where N is the number of long strings, and W is the average number of words per string. With such map in hand you could run a query in O(1).

Note that this approach pays off only when the number of queries significantly exceeds the average number of words in each string. For example, if your strings have ten words on the average, and you need to run five to eight queries, a linear search would be faster.

Upvotes: 1

Related Questions