compare two columns (in two files), then print the similar lines and different lines

Question

I have two files that contain two columns for each one, in the first column the lines are often similar.

file1.csv

C(2)—C(1) 1.5183
C(3)—C(2) 1.49
C(3)—C(1) 1.4991
O(4)—C(3) 1.4104
H(10)—O(4) 0.964
C(2)—C(1)—C(3) 59.19
C(3)—C(1)—H(5) 118.4

file2.csv

C(2)—C(1) 1.5052
C(3)—C(2) 1.505
C(3)—C(1) 1.5037
S(4)—C(3) 1.7976
H(10)—S(4) 1.3445
C(2)—C(1)—H(6) 117.68
C(2)—C(1)—C(3) 60.3
C(3)—C(1)—H(5) 116.99

here is a python script "using itertools" which compares the first colone in file1.csv and file2.csv and then prints the lines similar.

import itertools

files = ['file1.csv', 'file2.csv']
d = {}

for fi, f in enumerate(files):
    fh = open(f)
    for line in fh:
        sl = line.split()
        name = sl[0]
        val = float(sl[1])
        if name not in d:
            d[name] = {}
        if fi not in d[name]:
            d[name][fi] = []
        d[name][fi].append(val)
    fh.close()

for name, vals in d.items():
    if len(vals) == len(files):
        for var in itertools.product(*vals.values()):
            if max(var) - min(var) <= 20:
                out = '{}	{}'.format(name, "	".join(map(str, var)))
                print(out)
                break

output.csv

C(2)-C(1)        1.5183   1.5052
C(3)-C(2)        1.49     1.505
C(3)-C(1)        1.4991   1.5037
C(2)-C(1)-C(3)   59.19    60.3
C(3)-C(1)-H(5)   118.4    116.99

but I do not find an idea to print also the different lines.

my desired output:

similar_lines
C(2)-C(1)           1.5183    1.5052
C(3)-C(2)           1.49      1.505
C(3)-C(1)           1.4991    1.5037
C(2)-C(1)-C(3)      59.19     60.3
C(3)-C(1)-H(5)      118.4     116.99

different_lines
O(4)-C(3)           1.4104      non
H(10)-O(4)          0.964       non
S(4)-C(3)            non       1.7976
H(10)-S(4)           non       1.3445
C(2)-C(1)-H(6)       non       117.68

compare two columns (in two files), then print the similar lines and different lines

Answers (1)

Related Questions