So i started with the code from an answer to this question Function to find all common substrings in two strings not giving correct output and modified it a little bit to accommodate case-independence (i.e. AbCd is the same as ABCD as Abcd and so on) by turning the string to lowercase. However, for strings like 'ABCDXGHIJ' and 'ghijYAbCd' , it only returns ['ghij'] , not the desired output ['ABCD', 'GHIJ'] . Here are other examples: 'Bonywasawarrior' and 'Bonywasxwarrior' (output: ['Bonywas', 'warrior', 'wa'] , desired output: ['Bonywas', 'warrior'] ) '01101001' and '101010' (output: ['1010', '0', '1010', '01', '10', '01'] , desired output: ['1010'] ) here is my code: t = int(input()) #t cases while t > 0: A = str(input()) #1st string B = str(input()) #2nd string low_A = A.lower() low_B = B.lower() answer = "" anslist=[] for i in range(len(A)): common = "" for j in range(len(B)): if (i + j < len(A) and low_A[i + j] == low_B[j]): common += B[j] else: #if (len(common) > len(answer)): answer = common if answer != '' and len(answer) > 1: anslist.append(answer) common = "" if common != '': anslist.append(common) if len(anslist) == 0: print('[]') #print if no common substring else: print(anslist) t -= 1

Reputation: 97

Find all the common substrings between two strings, regardless of case and order

So i started with the code from an answer to this question Function to find all common substrings in two strings not giving correct output and modified it a little bit to accommodate case-independence (i.e. AbCd is the same as ABCD as Abcd and so on) by turning the string to lowercase. However, for strings like 'ABCDXGHIJ' and 'ghijYAbCd', it only returns ['ghij'], not the desired output ['ABCD', 'GHIJ'].

Here are other examples:

'Bonywasawarrior' and 'Bonywasxwarrior' (output: ['Bonywas', 'warrior', 'wa'], desired output: ['Bonywas', 'warrior'])
'01101001' and '101010' (output: ['1010', '0', '1010', '01', '10', '01'], desired output: ['1010'])

here is my code:

t = int(input()) #t cases

while t > 0:
    A = str(input()) #1st string
    B = str(input()) #2nd string

    low_A = A.lower()
    low_B = B.lower()

    answer = ""
    anslist=[]
    for i in range(len(A)):
        common = ""
        for j in range(len(B)):
            if (i + j < len(A) and low_A[i + j] == low_B[j]):
                common += B[j]
            else:
                #if (len(common) > len(answer)): 
                answer = common
                if answer != '' and len(answer) > 1:
                    anslist.append(answer)
                common = ""

        if common != '':
            anslist.append(common)

    if len(anslist) == 0:
        print('[]') #print if no common substring
    else:
        print(anslist)
    t -= 1

Upvotes: 0

Answers (2)

blhsing

Reputation: 106648

You can increment an offset in a while loop to keep concatenating common characters at the offset from the respective indices until they become different instead. To find the longest, non-overlapping common substrings, you can use a function that recursively traverses different paths of substring partitioning, and returns the one with the longest lengths of substrings:

def common_strings(a, b, i=0, j=0):
    candidates = []
    len_a = len(a)
    len_b = len(b)
    if j == len_b:
        candidates.append(common_strings(a, b, i + 1, 0))
    elif i < len_a:
        offset = 0
        while i + offset < len_a and j + offset < len_b and a[i + offset].lower() == b[j + offset].lower():
            offset += 1
        if offset > 1:
            candidates.append([a[i: i + offset]] + common_strings(a, b, i + offset, j + offset))
        candidates.append(common_strings(a, b, i, j + 1))
    return candidates and max(candidates, key=lambda t: sorted(map(len, t), reverse=True))

so that:

print(common_strings('ABCDXGHIJ', 'ghijYAbCd'))
print(common_strings('Bonywasawarrior', 'Bonywasxwarrior'))
print(common_strings('01101001', '101010'))

outputs:

['ABCD', 'GHIJ']
['Bonywas', 'warrior']
['1010']

Upvotes: 1

Booboo

Reputation: 44148

This is a duplicate of Finding all the common substrings of given two strings, which offers a solution in Java and for which I have done my best to translate to Python with the "enhancement" of making it case-insensitive:

def find_common(s, t):
    table = [len(t)*[0] for i in range(len(s))]
    longest = 0
    result = set()
    for i, ch1 in enumerate(s.lower()):
        for j, ch2 in enumerate(t.lower()):
            if ch1 != ch2:
                continue
            table[i][j] = 1 if i == 0 or j == 0 else 1 + table[i - 1][j - 1]
            if table[i][j] > longest:
                longest = table[i][j]
                result.clear()
            if table[i][j] == longest:
                result.add(s[i - longest + 1:i + 1]);
    return result


print(find_common('Bonywasawarrior', 'Bonywasxwarrior'))
print(find_common('01101001', '101010'))
print(find_common('ABCDXGHIJ', 'ghijYAbCd'))

Prints:

{'Bonywas', 'warrior'}
{'1010'}
{'GHIJ', 'ABCD'}

Upvotes: 0

Find all the common substrings between two strings, regardless of case and order

Answers (2)

Related Questions