Eugene
Eugene

Reputation: 97

Find all the common substrings between two strings, regardless of case and order

So i started with the code from an answer to this question Function to find all common substrings in two strings not giving correct output and modified it a little bit to accommodate case-independence (i.e. AbCd is the same as ABCD as Abcd and so on) by turning the string to lowercase. However, for strings like 'ABCDXGHIJ' and 'ghijYAbCd', it only returns ['ghij'], not the desired output ['ABCD', 'GHIJ'].

Here are other examples:

here is my code:

t = int(input()) #t cases

while t > 0:
    A = str(input()) #1st string
    B = str(input()) #2nd string

    low_A = A.lower()
    low_B = B.lower()

    answer = ""
    anslist=[]
    for i in range(len(A)):
        common = ""
        for j in range(len(B)):
            if (i + j < len(A) and low_A[i + j] == low_B[j]):
                common += B[j]
            else:
                #if (len(common) > len(answer)): 
                answer = common
                if answer != '' and len(answer) > 1:
                    anslist.append(answer)
                common = ""

        if common != '':
            anslist.append(common)

    if len(anslist) == 0:
        print('[]') #print if no common substring
    else:
        print(anslist)
    t -= 1

Upvotes: 0

Views: 1607

Answers (2)

blhsing
blhsing

Reputation: 106648

You can increment an offset in a while loop to keep concatenating common characters at the offset from the respective indices until they become different instead. To find the longest, non-overlapping common substrings, you can use a function that recursively traverses different paths of substring partitioning, and returns the one with the longest lengths of substrings:

def common_strings(a, b, i=0, j=0):
    candidates = []
    len_a = len(a)
    len_b = len(b)
    if j == len_b:
        candidates.append(common_strings(a, b, i + 1, 0))
    elif i < len_a:
        offset = 0
        while i + offset < len_a and j + offset < len_b and a[i + offset].lower() == b[j + offset].lower():
            offset += 1
        if offset > 1:
            candidates.append([a[i: i + offset]] + common_strings(a, b, i + offset, j + offset))
        candidates.append(common_strings(a, b, i, j + 1))
    return candidates and max(candidates, key=lambda t: sorted(map(len, t), reverse=True))

so that:

print(common_strings('ABCDXGHIJ', 'ghijYAbCd'))
print(common_strings('Bonywasawarrior', 'Bonywasxwarrior'))
print(common_strings('01101001', '101010'))

outputs:

['ABCD', 'GHIJ']
['Bonywas', 'warrior']
['1010']

Upvotes: 1

Booboo
Booboo

Reputation: 44148

This is a duplicate of Finding all the common substrings of given two strings, which offers a solution in Java and for which I have done my best to translate to Python with the "enhancement" of making it case-insensitive:

def find_common(s, t):
    table = [len(t)*[0] for i in range(len(s))]
    longest = 0
    result = set()
    for i, ch1 in enumerate(s.lower()):
        for j, ch2 in enumerate(t.lower()):
            if ch1 != ch2:
                continue
            table[i][j] = 1 if i == 0 or j == 0 else 1 + table[i - 1][j - 1]
            if table[i][j] > longest:
                longest = table[i][j]
                result.clear()
            if table[i][j] == longest:
                result.add(s[i - longest + 1:i + 1]);
    return result


print(find_common('Bonywasawarrior', 'Bonywasxwarrior'))
print(find_common('01101001', '101010'))
print(find_common('ABCDXGHIJ', 'ghijYAbCd'))

Prints:

{'Bonywas', 'warrior'}
{'1010'}
{'GHIJ', 'ABCD'}

Upvotes: 0

Related Questions