Reputation: 67
Following is the code that I have written that counts the number of substrings of length 2 that are common to both the input strings.Also the substrings should be at the same location in both the strings.
def string_match(a, b):
count=0
shorter=min(len(a),len(b))
for i in range(shorter):
if(a[i:i+2]==b[i:i+2]):
count=count+1
else:
continue
return count
The code runs fine for strings with different length but gives wrong answer for strings with same length. for eg: 'abc' and 'abc' should return 2 but it is returning 3 and also 'abc' and 'axc' should return 0 but it is returning 1. The above problem can be solved by changing range(shorter) to range(shorter-1), but I am not understanding why? Also if possible suggest me changes in the above code that can count same substrings regardless of the positions in the two strings.
Thank you in advance!
Upvotes: 0
Views: 184
Reputation: 841
Examine your for
loop
for i in range(shorter):
if a[i:i+2]==b[i:i+2]:
count=count+1
else:
continue
range(n)
by default goes from 0 to n-1
. So what happens in the case of n-1
? Your loop is attempting to access the n-1
th to n+1
th characters. But the smaller string only has n
characters. So Python simply returns that letter instead of two letters, and so two strings of equal length with the same last character would give a false positive. This is why range(shorter - 1)
is necessary.
Also the use of continue
is redundant as by default the loop will continue anyways
To find substrings of length 2 anywhere in the strings this should suffice
def string_match(string1, string2):
string1subs = [string1[i:i+2] for i in range(len(string1) - 1)]
count = 0
for i in range(len(string2) - 1):
if string2[i:i+2] in string1subs: count += 1
return count
Creates a list string1subs
that contains all substrings of length 2 in string1
. Then loops through all substrings of length 2 in string2
and checks if it is a substring of string1
. If you prefer a more concise version:
def string_match(string1, string2):
string1subs = [string1[i:i+2] for i in range(len(string1) - 1)]
return sum(string2[i:i+2] in string1subs for i in range(len(string2) - 1))
Exact same version using sum
and the fact that in Python, True
is equal to 1.
Upvotes: 2
Reputation: 42748
Best way is not to use any index access at all:
def string_match(a, b):
count = 0
equal = False
for c, d in zip(a,b):
count += equal and c == d
equal = c == d
return count
or with generator expression:
from itertools import islice
def string_match(a, b):
return sum(a1 == b1 and a2 == b2
for a1, a2, b1, b2 in zip(a, islice(a,1,None), b, islice(b,1,None)))
Upvotes: -2
Reputation: 6866
Some good old print debugging should make things clearer:
#!/usr/bin/env python2
#coding=utf8
def string_match(a, b):
count=0
shorter=min(len(a),len(b))
print 'comparing', a, b
for i in range(shorter):
x = a[i:i+2]
y = b[i:i+2]
print 'checking substrings at %d: ' % i, x, y
if x == y:
count=count+1
else:
continue
return count
for a, b in (('abc', 'abc'), ('abc', 'axc')):
count = string_match(a,b)
print a, b, count
And the output:
so$ ./test.py
comparing abc abc
checking substrings at 0: ab ab
checking substrings at 1: bc bc
checking substrings at 2: c c
abc abc 3
comparing abc axc
checking substrings at 0: ab ax
checking substrings at 1: bc xc
checking substrings at 2: c c
abc axc 1
See the problem? You're always comparing a substring of length 1 at the end. This is because 'abc'[2:4]
will give you just 'c'
.
So, you'd need to end one step earlier (or, more generally, n-1
steps earlier when you're comparing substrings of length n
). This is exactly what your -1
change would do, which is why it helps.
With the -1
change:
#!/usr/bin/env python2
#coding=utf8
def string_match(a, b):
count=0
shorter=min(len(a),len(b))
print 'comparing', a, b
for i in range(shorter-1):
x = a[i:i+2]
y = b[i:i+2]
print 'checking substrings at %d: ' % i, x, y
if x == y:
count=count+1
else:
continue
return count
for a, b in (('abc', 'abc'), ('abc', 'axc')):
count = string_match(a,b)
print a, b, count
And the new output:
so$ ./test.py
comparing abc abc
checking substrings at 0: ab ab
checking substrings at 1: bc bc
abc abc 2
comparing abc axc
checking substrings at 0: ab ax
checking substrings at 1: bc xc
abc axc 0
Upvotes: 4