Bryce
Bryce

Reputation: 2207

Finding non-adjacent subsequences in a string

Say I am searching in a string, for a subsequence, where the elements do not necessarily have to be adjacent, but have to occur within N characters. So,

search("abc","aaabbbccc",7) => True
search("abc","aabbcc",3) => False

I am looking for an efficient data structure / algorithm that will perform this comparison. I can think of a few approaches like searching for all valid combos of interior wildcards, like

search("abc",whatever,4) => "abc","a*bc","ab*c"

And using any of the multi-string search algorithms (probably Aho–Corasick), but I'm wondering if there is a better solution.

Upvotes: 1

Views: 932

Answers (1)

Jesse Harris
Jesse Harris

Reputation: 1141

I have attached a python code sample that does what you want. It loops through the string to be searched and if the first letter of search string is found, a substring of length=max_length is created and sent to another function. This function simply moves through the substring trying to find all of the search string letters in order. If it finds them all then it returns True, otherwise False.

def check_substring(find_me, substr):
    find_index = 0
    for letter in substr:
        if find_me[find_index] == letter:
            find_index +=1
        # if we reach the end of find_me, return true
        if find_index >= len(find_me):
            return True
    return False

def check_string(find_me, look_here, max_len):
    for index in range(len(look_here)):
        if find_me[0] == look_here[index]:
            if check_substring(find_me, look_here[index:index + max_len]):
                return True
    return False



fm = "abc"
lh = "aabbbccceee"
ml = 5

print check_string(fm, lh, ml)

Upvotes: 1

Related Questions