user1712
user1712

Reputation: 67

Why is the code not working for strings with same length?

Following is the code that I have written that counts the number of substrings of length 2 that are common to both the input strings.Also the substrings should be at the same location in both the strings.

def string_match(a, b):
  count=0
  shorter=min(len(a),len(b))
  for i in range(shorter):
    if(a[i:i+2]==b[i:i+2]):
      count=count+1
    else:
      continue
  return count

The code runs fine for strings with different length but gives wrong answer for strings with same length. for eg: 'abc' and 'abc' should return 2 but it is returning 3 and also 'abc' and 'axc' should return 0 but it is returning 1. The above problem can be solved by changing range(shorter) to range(shorter-1), but I am not understanding why? Also if possible suggest me changes in the above code that can count same substrings regardless of the positions in the two strings.

Thank you in advance!

Upvotes: 0

Views: 184

Answers (3)

Neil A.
Neil A.

Reputation: 841

Examine your for loop

for i in range(shorter):
    if a[i:i+2]==b[i:i+2]:
        count=count+1
    else:
        continue

range(n) by default goes from 0 to n-1. So what happens in the case of n-1? Your loop is attempting to access the n-1th to n+1th characters. But the smaller string only has n characters. So Python simply returns that letter instead of two letters, and so two strings of equal length with the same last character would give a false positive. This is why range(shorter - 1) is necessary.

Also the use of continue is redundant as by default the loop will continue anyways

To find substrings of length 2 anywhere in the strings this should suffice

def string_match(string1, string2):
    string1subs = [string1[i:i+2] for i in range(len(string1) - 1)]
    count = 0
    for i in range(len(string2) - 1):
        if string2[i:i+2] in string1subs: count += 1
    return count

Creates a list string1subs that contains all substrings of length 2 in string1. Then loops through all substrings of length 2 in string2 and checks if it is a substring of string1. If you prefer a more concise version:

def string_match(string1, string2):
    string1subs = [string1[i:i+2] for i in range(len(string1) - 1)]
    return sum(string2[i:i+2] in string1subs for i in range(len(string2) - 1))

Exact same version using sum and the fact that in Python, True is equal to 1.

Upvotes: 2

Daniel
Daniel

Reputation: 42748

Best way is not to use any index access at all:

def string_match(a, b):
    count = 0
    equal = False
    for c, d in zip(a,b):
        count += equal and c == d
        equal = c == d
    return count

or with generator expression:

from itertools import islice
def string_match(a, b):
    return sum(a1 == b1 and a2 == b2
        for a1, a2, b1, b2 in zip(a, islice(a,1,None), b, islice(b,1,None)))

Upvotes: -2

Snild Dolkow
Snild Dolkow

Reputation: 6866

Some good old print debugging should make things clearer:

#!/usr/bin/env python2
#coding=utf8

def string_match(a, b):
    count=0
    shorter=min(len(a),len(b))
    print 'comparing', a, b
    for i in range(shorter):
        x = a[i:i+2]
        y = b[i:i+2]
        print 'checking substrings at %d: ' % i, x, y
        if x == y:
            count=count+1
        else:
            continue
    return count


for a, b in (('abc', 'abc'), ('abc', 'axc')):
    count = string_match(a,b)
    print a, b, count

And the output:

so$ ./test.py 
comparing abc abc
checking substrings at 0:  ab ab
checking substrings at 1:  bc bc
checking substrings at 2:  c c
abc abc 3
comparing abc axc
checking substrings at 0:  ab ax
checking substrings at 1:  bc xc
checking substrings at 2:  c c
abc axc 1

See the problem? You're always comparing a substring of length 1 at the end. This is because 'abc'[2:4] will give you just 'c'.

So, you'd need to end one step earlier (or, more generally, n-1 steps earlier when you're comparing substrings of length n). This is exactly what your -1 change would do, which is why it helps.

With the -1 change:

#!/usr/bin/env python2
#coding=utf8

def string_match(a, b):
    count=0
    shorter=min(len(a),len(b))
    print 'comparing', a, b
    for i in range(shorter-1):
        x = a[i:i+2]
        y = b[i:i+2]
        print 'checking substrings at %d: ' % i, x, y
        if x == y:
            count=count+1
        else:
            continue
    return count


for a, b in (('abc', 'abc'), ('abc', 'axc')):
    count = string_match(a,b)
    print a, b, count

And the new output:

so$ ./test.py 
comparing abc abc
checking substrings at 0:  ab ab
checking substrings at 1:  bc bc
abc abc 2
comparing abc axc
checking substrings at 0:  ab ax
checking substrings at 1:  bc xc
abc axc 0

Upvotes: 4

Related Questions