Linus Svendsson
Linus Svendsson

Reputation: 1477

Find the position of difference between two strings

I have two strings of equal length, how can I find all the locations where the strings are different?

For example, "HELPMEPLZ" and "HELPNEPLX" are different at positions 4 and 8.

Upvotes: 26

Views: 61417

Answers (8)

Konchog
Konchog

Reputation: 2200

There is nothing particularly wrong with many of the answers already here. However, if one prefers efficient one-liners, then the following works pretty well (and involves no imports nor additional memory caused via list coercion), while returning the deviating indices in a list

>>> a, b = "HELPMEPLZ", "HELPNEPLX"
>>> [i for i, c in enumerate(zip(a, b)) if c[0] != c[1]]
[4, 8]

What is a little more complex is to return just the first index without having to iterate through the entire list - this can be done with an efficient one-liner as follows (which returns -1 they are the same):

>>> a, b = "HELPMEPLZ", "HELPNEPLX"
>>> # return None if a, b are both empty
>>> # -1 if they are both the same.
>>> # otherwise the index where they first differ.
>>> next((i for i, c in enumerate(zip(a, b)) if c[0] != c[1]), -1)
4

Again, it can be useful to find where a set of two or more strings deviate, and then sets help.

>>> a, b, c = "HELPMEPLZ", "HELPNEPLX", "HELPNFPLX"
>>> strs = a, b, c
>>> [i for i, c in enumerate(zip(*strs)) if len(set(c)) > 1]
[4, 5, 8]

This also works for finding the first, as above..

>>> a, b, c = "HELPMEPLZ", "HELPNEPLX", "HELPNFPLX"
>>> strs = a, b, c
next((i for i, c in enumerate(zip(*strs)) if len(set(c)) > 1), -1)
4

Upvotes: 0

khyox
khyox

Reputation: 1326

Building on the direction pointed by @FredrikPihl, here you have a solution that is also able to detect insertions/deletions using a module in the Python Standard Library:

import difflib
a = 'HELPMEPLZ'
b = 'HELPNEPLX'
s = difflib.SequenceMatcher(None, a, b, autojunk=False)
for tag, i1, i2, j1, j2 in s.get_opcodes():
    if tag != 'equal':
        print('{:7}   a[{}:{}] --> b[{}:{}] {!r:>8} --> {!r}'.format(
            tag, i1, i2, j1, j2, a[i1:i2], b[j1:j2]))

With the output:

replace   a[4:5] --> b[4:5]      'M' --> 'N'
replace   a[8:9] --> b[8:9]      'Z' --> 'X'

Let's see how it works with a similar example including deletions and additions:

a = 'HELPMEPLZ'
b = 'HLP NEE PLX'

With the output:

delete    a[1:2] --> b[1:1]      'E' --> ''
replace   a[4:5] --> b[3:5]      'M' --> ' N'
insert    a[6:6] --> b[6:8]       '' --> 'E '
replace   a[8:9] --> b[10:11]      'Z' --> 'X'

Upvotes: 3

Óscar López
Óscar López

Reputation: 236150

Try this:

s1 = 'HELPMEPLZ'
s2 = 'HELPNEPLX'
[i for i in xrange(len(s1)) if s1[i] != s2[i]]

It will return:

> [4, 8]

The above solution will return a list with the indexes in sorted order, won't create any unnecessary intermediate data structures and it will work on Python 2.3 - 2.7. For Python 3.x replace xrange for range.

Upvotes: 34

Fredrik Pihl
Fredrik Pihl

Reputation: 45670

Python really comes with batteries included. Have a look at difflib

>>> import difflib
>>> a='HELPMEPLZ'
>>> b='HELPNEPLX'
>>> s = difflib.SequenceMatcher(None, a, b)
>>> for block in s.get_matching_blocks():
...     print block
Match(a=0, b=0, size=4)
Match(a=5, b=5, size=3)
Match(a=9, b=9, size=0)

difflib is very powerful and a some study of the documentation is really recommended.

Upvotes: 26

Brigand
Brigand

Reputation: 86270

If you store the two strings in a and b, you can loop through all the items and check for inequality.

python interactive interpreter:

>>> for i in range(len(a)):
...   if a[i] != b[i]: print i, a[i], b[i]
... 
4 M N
8 Z X

Another way to do this is with list comprehensions. It's all in one line, and the output is a list.

>>> [i for i in range(len(a)) if a[i] != b[i]]
[4, 8]

That makes it really easy to wrap into a function, which makes calling it on a variety of inputs easy.

>>> def dif(a, b):
...     return [i for i in range(len(a)) if a[i] != b[i]]
...
>>> dif('HELPMEPLZ', 'HELPNEPLX')
[4, 8]
>>> dif('stackoverflow', 'stacklavaflow')
[5, 6, 7, 8]

Upvotes: 2

ovgolovin
ovgolovin

Reputation: 13430

>>> from itertools import izip
>>> s1 = 'HELPMEPLZ'
>>> s2 = 'HELPNEPLX'
>>> [i for i,(a1,a2)  in enumerate(izip(s1,s2)) if a1!=a2]
[4, 8]

Upvotes: 6

Katriel
Katriel

Reputation: 123782

Pair up the strings character-by-character and iterate over this collection together with a counting index. Test whether the characters in each pair differ; if they do, output the index of where.

Using Python builtin functions you can do this neatly in one line:

>>> x = 'HELPMEPLZ'
>>> y = 'HELPNEPLX'
>>> {i for i, (left, right) in enumerate(zip(x,y)) if left != right}
{8, 4}

Upvotes: 2

Dobbo1989
Dobbo1989

Reputation: 303

Easiest way is to split data into two char arrays and then loop through comparing the letters and return the index when the two chars do not equal each other.

This method will work fine as long as both strings are equal in length.

Upvotes: 0

Related Questions