Reputation: 97
So i started with the code from an answer to this question Function to find all common substrings in two strings not giving correct output and modified it a little bit to accommodate case-independence (i.e. AbCd is the same as ABCD as Abcd and so on) by turning the string to lowercase. However, for strings like 'ABCDXGHIJ'
and 'ghijYAbCd'
, it only returns ['ghij']
, not the desired output ['ABCD', 'GHIJ']
.
Here are other examples:
'Bonywasawarrior'
and 'Bonywasxwarrior'
(output: ['Bonywas', 'warrior', 'wa']
, desired output: ['Bonywas', 'warrior']
)'01101001'
and '101010'
(output: ['1010', '0', '1010', '01', '10', '01']
, desired output: ['1010']
)here is my code:
t = int(input()) #t cases
while t > 0:
A = str(input()) #1st string
B = str(input()) #2nd string
low_A = A.lower()
low_B = B.lower()
answer = ""
anslist=[]
for i in range(len(A)):
common = ""
for j in range(len(B)):
if (i + j < len(A) and low_A[i + j] == low_B[j]):
common += B[j]
else:
#if (len(common) > len(answer)):
answer = common
if answer != '' and len(answer) > 1:
anslist.append(answer)
common = ""
if common != '':
anslist.append(common)
if len(anslist) == 0:
print('[]') #print if no common substring
else:
print(anslist)
t -= 1
Upvotes: 0
Views: 1607
Reputation: 106648
You can increment an offset in a while
loop to keep concatenating common characters at the offset from the respective indices until they become different instead. To find the longest, non-overlapping common substrings, you can use a function that recursively traverses different paths of substring partitioning, and returns the one with the longest lengths of substrings:
def common_strings(a, b, i=0, j=0):
candidates = []
len_a = len(a)
len_b = len(b)
if j == len_b:
candidates.append(common_strings(a, b, i + 1, 0))
elif i < len_a:
offset = 0
while i + offset < len_a and j + offset < len_b and a[i + offset].lower() == b[j + offset].lower():
offset += 1
if offset > 1:
candidates.append([a[i: i + offset]] + common_strings(a, b, i + offset, j + offset))
candidates.append(common_strings(a, b, i, j + 1))
return candidates and max(candidates, key=lambda t: sorted(map(len, t), reverse=True))
so that:
print(common_strings('ABCDXGHIJ', 'ghijYAbCd'))
print(common_strings('Bonywasawarrior', 'Bonywasxwarrior'))
print(common_strings('01101001', '101010'))
outputs:
['ABCD', 'GHIJ']
['Bonywas', 'warrior']
['1010']
Upvotes: 1
Reputation: 44148
This is a duplicate of Finding all the common substrings of given two strings, which offers a solution in Java and for which I have done my best to translate to Python with the "enhancement" of making it case-insensitive:
def find_common(s, t):
table = [len(t)*[0] for i in range(len(s))]
longest = 0
result = set()
for i, ch1 in enumerate(s.lower()):
for j, ch2 in enumerate(t.lower()):
if ch1 != ch2:
continue
table[i][j] = 1 if i == 0 or j == 0 else 1 + table[i - 1][j - 1]
if table[i][j] > longest:
longest = table[i][j]
result.clear()
if table[i][j] == longest:
result.add(s[i - longest + 1:i + 1]);
return result
print(find_common('Bonywasawarrior', 'Bonywasxwarrior'))
print(find_common('01101001', '101010'))
print(find_common('ABCDXGHIJ', 'ghijYAbCd'))
Prints:
{'Bonywas', 'warrior'}
{'1010'}
{'GHIJ', 'ABCD'}
Upvotes: 0