Reputation: 3774
i patch a list to look like another:
a = [x for x in "qabxcd"]
b = [x for x in "abycdf"]
c = a[:]
s = SequenceMatcher(None, a, b)
for tag, i1, i2, j1, j2 in s.get_opcodes():
print ("%7s a[%d:%d] (%s) b[%d:%d] (%s)" %
(tag, i1, i2, a[i1:i2], j1, j2, b[j1:j2]))
if tag == "delete":
del c[i1:i2]
elif tag == "replace":
c[i1:i2] = b[j1-1:j2-1]
elif tag == "insert":
c[i1:i2] = b[j1:j2]
print c
print b
print c == b
a == b
but the list is not equal:
delete a[0:1] (['q']) b[0:0] ([])
equal a[1:3] (['a', 'b']) b[0:2] (['a', 'b'])
replace a[3:4] (['x']) b[2:3] (['y'])
equal a[4:6] (['c', 'd']) b[3:5] (['c', 'd'])
insert a[6:6] ([]) b[5:6] (['f'])
['a', 'b', 'x', 'b', 'd', 'f']
['a', 'b', 'y', 'c', 'd', 'f']
False
what is the problem?
Upvotes: 4
Views: 744
Reputation: 14255
The OP's example can be simplified slightly by initializing c
to an empty string. No need to handle the "delete"
case then:
a = "qabxcd"
b = "abycdf"
c = "" # <-- initialize to empty string
s = SequenceMatcher(None, a, b)
for tag, i1, i2, j1, j2 in s.get_opcodes():
...
if tag == "equal":
c += a[i1:i2]
elif tag in ["replace", "insert"]:
c += b[j1:j2]
print(c)
print(b)
assert b == c
Upvotes: 0
Reputation: 3774
All the action shift the indexes. When i will to do it, i must count the changes:
a = [x for x in "abyffgh fg99"]
b = [x for x in "999aby99ff9h9"]
c = a[:]
s = SequenceMatcher(None, a, b)
i = 0
for tag, i1, i2, j1, j2 in s.get_opcodes():
print ("%7s a[%d:%d] (%s) b[%d:%d] (%s) c[%d:%d] (%s)" %
(tag, i1, i2, a[i1:i2], j1, j2, b[j1:j2], i1, i2, c[i1 + i:i2 + i]))
if tag == "delete":
del c[i1 + i:i2 + i]
i -= i2 - i1
elif tag == "replace":
c[i1 + i:i2 + i] = b[j1:j2]
i -= i2 - i1 - j2 + j1
elif tag == "insert":
c[i1 + i:i2 + i] = b[j1:j2]
i += j2 - j1
print c
print i
print c
print b
print c == b
a == b
output:
['9', '9', '9', 'a', 'b', 'y', '9', '9', 'f', 'f', '9', 'h', ' ', 'f', 'g', '9', '9']
5
delete a[7:10] ([' ', 'f', 'g']) b[12:12] ([]) c[7:10] ([' ', 'f', 'g'])
['9', '9', '9', 'a', 'b', 'y', '9', '9', 'f', 'f', '9', 'h', '9', '9']
1
equal a[10:11] (['9']) b[12:13] (['9']) c[10:11] (['h'])
['9', '9', '9', 'a', 'b', 'y', '9', '9', 'f', 'f', '9', 'h', '9', '9']
1
delete a[11:12] (['9']) b[13:13] ([]) c[11:12] (['9'])
['9', '9', '9', 'a', 'b', 'y', '9', '9', 'f', 'f', '9', 'h', '9']
-1
['9', '9', '9', 'a', 'b', 'y', '9', '9', 'f', 'f', '9', 'h', '9']
['9', '9', '9', 'a', 'b', 'y', '9', '9', 'f', 'f', '9', 'h', '9']
True
Upvotes: 2
Reputation: 14209
I think I can see why: the 5-tuples returned by s.get_opcodes()
are valid on the initial state of your containers, i.e. they have to be adapted if your object changes: that is the case of the delete operation notably, that changes the indexes (that's why 'x'
is not turned into 'y'
).
As far as I could see, the delete operation is the only one changing the indexes, so I would replace the deleted items by a marker (I used '#') and remove it at the end:
>>> c = a[:]
>>> for tag, i1, i2, j1, j2 in s.get_opcodes():
print ("%7s a[%d:%d] (%s) b[%d:%d] (%s)" %
(tag, i1, i2, a[i1:i2], j1, j2, b[j1:j2]))
if tag == "delete":
c[i1:i2] = ['#' for i in range(i1, i2)]
elif tag == "replace":
c[i1:i2] = b[j1:j2]
elif tag == "insert":
c[i1:i1] = b[j1:j2]
print c
delete a[0:1] (['q']) b[0:0] ([])
['#', 'a', 'b', 'x', 'c', 'd']
equal a[1:3] (['a', 'b']) b[0:2] (['a', 'b'])
['#', 'a', 'b', 'x', 'c', 'd']
replace a[3:4] (['x']) b[2:3] (['y'])
['#', 'a', 'b', 'y', 'c', 'd']
equal a[4:6] (['c', 'd']) b[3:5] (['c', 'd'])
['#', 'a', 'b', 'y', 'c', 'd']
insert a[6:6] ([]) b[5:6] (['f'])
['#', 'a', 'b', 'y', 'c', 'd', 'f']
>>> c = [i for i in c if i != '#']
>>> c
['a', 'b', 'y', 'c', 'd', 'f']
>>>
Upvotes: 1