Reputation: 2207
Say I am searching in a string, for a subsequence, where the elements do not necessarily have to be adjacent, but have to occur within N characters. So,
search("abc","aaabbbccc",7) => True
search("abc","aabbcc",3) => False
I am looking for an efficient data structure / algorithm that will perform this comparison. I can think of a few approaches like searching for all valid combos of interior wildcards, like
search("abc",whatever,4) => "abc","a*bc","ab*c"
And using any of the multi-string search algorithms (probably Aho–Corasick), but I'm wondering if there is a better solution.
Upvotes: 1
Views: 932
Reputation: 1141
I have attached a python code sample that does what you want. It loops through the string to be searched and if the first letter of search string is found, a substring of length=max_length is created and sent to another function. This function simply moves through the substring trying to find all of the search string letters in order. If it finds them all then it returns True, otherwise False.
def check_substring(find_me, substr):
find_index = 0
for letter in substr:
if find_me[find_index] == letter:
find_index +=1
# if we reach the end of find_me, return true
if find_index >= len(find_me):
return True
return False
def check_string(find_me, look_here, max_len):
for index in range(len(look_here)):
if find_me[0] == look_here[index]:
if check_substring(find_me, look_here[index:index + max_len]):
return True
return False
fm = "abc"
lh = "aabbbccceee"
ml = 5
print check_string(fm, lh, ml)
Upvotes: 1