Reputation: 23339
I asked to solve the following problem in a programming competition (facebook recruitment)
Input: list of sub-strings:
{"bar","foo","hi"} //from 1 to 100 sub-strings
and the sentence:
"hellothisisfoobarhi" //from 1 to 1000000 character
Output: 12 //the index of the first match in the sentence (foo)
another example would be :
sub-strings:{"hi","hi"}
sentence :"hiJonIamSayinghihiforYou"
output:15 // the index of hi,the second 'hi' because the first 'hi' is just a trick,the sub-sentece is 'hi' hi"
one more ex:
sub-strings:{"hi","foo"}
sentence :"sayingfoohi"
output:6 // the order doesn't matter ,they just need to be beside each others
Who knows a good algorithm for this problem?
Upvotes: 4
Views: 170
Reputation: 19601
Another way - concentrating on the substrings rather than the main string - would be http://en.wikipedia.org/wiki/Aho%E2%80%93Corasick_string_matching_algorithm
Upvotes: 0
Reputation: 5348
Construct a suffix tree of the large string -- the construction of the tree is O(n) where n is the length of the large string.
Now you can find the location of any of the substrings in O(m) time (where m is the length of the substring) by simply following the the substring down through the tree -- the node or leaf where the substring ends will hold the key corresponding to the index into the large string.
Go through the set of substrings finding their location in the big string, keeping track of the minimum index.
Upvotes: 1