Differences in lists of lists using Python

Question

I have two list of lists like below. I know I can use set(list1)-set(list2) or vice-versa to print the list that's different from the other corresponding one. However, I do not want the complete list to be printed out, I just want that part of the list that's been modified.

For example, list1:

[['Code', 'sID', 'dID', 'cID', 'ssID'], ['ABCD-00', 'ABCD-00-UNK', '358', '1234', '9999'], ['ABCD-01', 'ABCD-00-UNK', 160, '993', '587']]

list2:

[['Code', 'sID', 'dID', 'cID', 'ssID', 'AddedColumn'], ['ABCD-00', 'ABCD-00-UNK', '358', '1234', '9999', 'AddedValue1'], ['ABCD-01', 'ABCD-00-UNK', 160, '993', 'ChangedValue', 'AddedValue2']]

If I do set difference, it prints out the entire list. I want the output to show only the columns that are different/added/taken away when 'Code', 'sID' are the same.

The first list of each list of lists is the header. So I want to compare the lists when values from 'Code', 'sID' columns are matching.

Desired output:

Added - ['AddedColumn', 'AddedValue1', 'AddedValue2']
Deleted - []
Changed - ['Code', 'ABCD-01', 'ssID', 'ChangeValue']

something like this or anything simpler is fine also.

The code I've tried:

from difflib import SequenceMatcher

matcher = SequenceMatcher()
for a, b in zip(list1, list2):
    matcher.set_seqs(a, b)
    for tag, i1, i2, j1, j2 in matcher.get_opcodes():
        if tag == 'equal': continue
        print('{:>7s} {} {}'.format(tag, a[i1:i2], b[j1:j2]))

It works well in comparing corresponding lists, i.e., sub-list1 in list1 with sub-list1 in list2. But I want it to compare across the entire list because if a particular sub-list is missing, it prints that everything is different. By sub-list I mean, for example ['Code', 'sID', 'dID', 'cID', 'ssID'] in list1 is sub-list1.

user3467349 · Accepted Answer

Here is my rudimentary interpretation. OP isn't quite clear on what they want as to the changed list - so they should update their requirements more specifically. As jsbueno suggests a dict may be better - it really depends, lists are cheaper if that's the format it came in.

added = []
deleted = []
changed = []
for  sub_l1, sub_l2 in zip(l1, l2): 
    for i in range(min(len(sub_l1), len(sub_l2))): 
        if sub_l1[i] != sub_l2[i]: 
            changed.append(sub_l2[i])
    if len(sub_l2) > len(sub_l1): 
        added.append(sub_l2[len(sub_l1):len(sub_l2)])
    elif len(sub_l1) > len(sub_l2):
        deleted.append(sub_l1[len(sub_l2):len(sub_l1)])

sample output:

In [66]: added
Out[66]: [['AddedColumn'], ['AddedValue1'], ['AddedValue2']]
In [67]: deleted
Out[67]: []
In [68]: changed
Out[68]: ['ChangedValue']

note that changed isn't telling you which value changed, generally you might want a tuple with the CSV sublist and column number.

Differences in lists of lists using Python

Answers (2)

Related Questions