Reputation: 3645
I am just checking for an efficient algorithm with best computational complexity to check if a child string - tobeVerified
exists in a huge parent string
I was going through different algorithms but I am yet to find something which offers O(n)
I came up with the below implementation using HashSet
which is gives me O(n+m) ~ O(n)
I wanted to check if this is the right way of doing it or if any other optimization is possible. But in this approach there is problem of consuming more space
String parent = "the value is very high";
String tobeVerified = "is";
Set wordSet = new HashSet<String>();
String[] words = parent.trim().toUpperCase().split("\\s+");
//This is O(n) n - Parent Size m - substring size
for(String word: words){
wordSet.add(word);
}
//This is O(1)
System.out.println(wordSet.contains(tobeVerified.toUpperCase()));
}
Upvotes: 1
Views: 3200
Reputation: 104474
One of the classic O(n+m)
substring search algorithms is Boyer-Moore. It should have better performance than String.contains
or String.indexOf
for sufficiently large strings.
There's a Java implementation of the algorithm on that wikipedia page link above, but it's written to use a char[]
array as input instead of on an instance of the String class. So either modify the code to work with a String parameter , or take into account the additional cost, O(n), of cloning a String
into a char[]
.
One little issue I spotted on the wikipedia code. It assumes character values are only in the 8-bit range. You might need to modify this line:
final int ALPHABET_SIZE = 256;
To be this:
final int ALPHABET_SIZE = 65536;
Update: I updated the wikipedia page code appropriately to have the correct value for ALPHABET_SIZE
. Confirmed the original bug existed and wrote a unit test to validate the fix.
Upvotes: 3
Reputation: 159086
You could go for a Boyer-Moore implementation, as suggested in answer by selbie, if profiling shows that you truly have a performance issue.
Until then, just do a simple regex search:
String textToSearch = "the value is very high";
String wordToFind = "is";
String regex = "(?i)\\b" + Pattern.quote(wordToFind) + "\\b";
boolean found = Pattern.compile(regex).matcher(textToSearch).find();
(?i)
makes the search case-insensitive, and \\b
matches word-boundary, e.g. ensuring that is
will not match this
. Since you're doing word-search, Pattern.quote()
is likely unnecessary, but better be safe than sorry.
Upvotes: 1