Reputation: 1477
I have two strings of equal length, how can I find all the locations where the strings are different?
For example, "HELPMEPLZ" and "HELPNEPLX" are different at positions 4 and 8.
Upvotes: 26
Views: 61417
Reputation: 2200
There is nothing particularly wrong with many of the answers already here. However, if one prefers efficient one-liners, then the following works pretty well (and involves no imports nor additional memory caused via list coercion), while returning the deviating indices in a list
>>> a, b = "HELPMEPLZ", "HELPNEPLX"
>>> [i for i, c in enumerate(zip(a, b)) if c[0] != c[1]]
[4, 8]
What is a little more complex is to return just the first index without having to iterate through the entire list - this can be done with an efficient one-liner as follows (which returns -1 they are the same):
>>> a, b = "HELPMEPLZ", "HELPNEPLX"
>>> # return None if a, b are both empty
>>> # -1 if they are both the same.
>>> # otherwise the index where they first differ.
>>> next((i for i, c in enumerate(zip(a, b)) if c[0] != c[1]), -1)
4
Again, it can be useful to find where a set of two or more strings deviate, and then sets help.
>>> a, b, c = "HELPMEPLZ", "HELPNEPLX", "HELPNFPLX"
>>> strs = a, b, c
>>> [i for i, c in enumerate(zip(*strs)) if len(set(c)) > 1]
[4, 5, 8]
This also works for finding the first, as above..
>>> a, b, c = "HELPMEPLZ", "HELPNEPLX", "HELPNFPLX"
>>> strs = a, b, c
next((i for i, c in enumerate(zip(*strs)) if len(set(c)) > 1), -1)
4
Upvotes: 0
Reputation: 1326
Building on the direction pointed by @FredrikPihl, here you have a solution that is also able to detect insertions/deletions using a module in the Python Standard Library:
import difflib
a = 'HELPMEPLZ'
b = 'HELPNEPLX'
s = difflib.SequenceMatcher(None, a, b, autojunk=False)
for tag, i1, i2, j1, j2 in s.get_opcodes():
if tag != 'equal':
print('{:7} a[{}:{}] --> b[{}:{}] {!r:>8} --> {!r}'.format(
tag, i1, i2, j1, j2, a[i1:i2], b[j1:j2]))
With the output:
replace a[4:5] --> b[4:5] 'M' --> 'N'
replace a[8:9] --> b[8:9] 'Z' --> 'X'
Let's see how it works with a similar example including deletions and additions:
a = 'HELPMEPLZ'
b = 'HLP NEE PLX'
With the output:
delete a[1:2] --> b[1:1] 'E' --> ''
replace a[4:5] --> b[3:5] 'M' --> ' N'
insert a[6:6] --> b[6:8] '' --> 'E '
replace a[8:9] --> b[10:11] 'Z' --> 'X'
Upvotes: 3
Reputation: 236150
Try this:
s1 = 'HELPMEPLZ'
s2 = 'HELPNEPLX'
[i for i in xrange(len(s1)) if s1[i] != s2[i]]
It will return:
> [4, 8]
The above solution will return a list with the indexes in sorted order, won't create any unnecessary intermediate data structures and it will work on Python 2.3 - 2.7. For Python 3.x replace xrange
for range
.
Upvotes: 34
Reputation: 45670
Python really comes with batteries included. Have a look at difflib
>>> import difflib
>>> a='HELPMEPLZ'
>>> b='HELPNEPLX'
>>> s = difflib.SequenceMatcher(None, a, b)
>>> for block in s.get_matching_blocks():
... print block
Match(a=0, b=0, size=4)
Match(a=5, b=5, size=3)
Match(a=9, b=9, size=0)
difflib
is very powerful and a some study of the documentation is really recommended.
Upvotes: 26
Reputation: 86270
If you store the two strings in a
and b
, you can loop through all the items and check for inequality.
python interactive interpreter:
>>> for i in range(len(a)):
... if a[i] != b[i]: print i, a[i], b[i]
...
4 M N
8 Z X
Another way to do this is with list comprehensions. It's all in one line, and the output is a list.
>>> [i for i in range(len(a)) if a[i] != b[i]]
[4, 8]
That makes it really easy to wrap into a function, which makes calling it on a variety of inputs easy.
>>> def dif(a, b):
... return [i for i in range(len(a)) if a[i] != b[i]]
...
>>> dif('HELPMEPLZ', 'HELPNEPLX')
[4, 8]
>>> dif('stackoverflow', 'stacklavaflow')
[5, 6, 7, 8]
Upvotes: 2
Reputation: 13430
>>> from itertools import izip
>>> s1 = 'HELPMEPLZ'
>>> s2 = 'HELPNEPLX'
>>> [i for i,(a1,a2) in enumerate(izip(s1,s2)) if a1!=a2]
[4, 8]
Upvotes: 6
Reputation: 123782
Pair up the strings character-by-character and iterate over this collection together with a counting index. Test whether the characters in each pair differ; if they do, output the index of where.
Using Python builtin functions you can do this neatly in one line:
>>> x = 'HELPMEPLZ'
>>> y = 'HELPNEPLX'
>>> {i for i, (left, right) in enumerate(zip(x,y)) if left != right}
{8, 4}
Upvotes: 2
Reputation: 303
Easiest way is to split data into two char arrays and then loop through comparing the letters and return the index when the two chars do not equal each other.
This method will work fine as long as both strings are equal in length.
Upvotes: 0