ThanaDaray
ThanaDaray

Reputation: 1693

How to compare columns from two csv files?

I am trying to compare columns from 2 csv files.

In file1.csv:

aaa,1
abc,2
bcd,2
adc,3

In file2.csv:

aaa,1
abc,0
bcd,2
adc,4

I expected to get the results "Not Equ".

If the first columns are the same but the second columns are difference.

I tried the codes below but not success:

import csv
file1 = 'C:/Users/Desktop/file1.csv'
file2 = 'C:/Users/Desktop/file2.csv'

reader1 = csv.reader(open(file1))
reader2 = csv.reader(open(file2))

for row1 in reader1:
    text1 = row1[0].split(',')
    test1sentence = text1[0]
    test1class = text1[1]

    for row2 in reader2:
        text2 = row2[0].split(',')
        test2sentence = text2[0]
        test2class = text2[1]

        if test1sentence == test2sentence:
            if test1class != test2class:
                print "Not Equ"

Any suggestion?

Upvotes: 0

Views: 3543

Answers (3)

Adam Matan
Adam Matan

Reputation: 136141

You're basically comparing to dictionaries (or maps), that matches keys to values.

The right way to do this is to compare the keysets. If they're equal, for each key, compare the value in both files.

#!/usr/bin/python

def file_to_dict(filename):
    lines  =  open(filename).read().split()
    return dict([line.split(',') for line in lines])

dict1, dict2 = file_to_dict('file1.csv'), file_to_dict('file2.csv')

print "Keys are equal? ", set(dict1.keys())==set(dict2.keys())
print "Values are equal? ", all([dict1[key]==dict2[key] for key in dict1])

Upvotes: 2

Janne Karila
Janne Karila

Reputation: 25197

You can read the files into two dicts and compare like this:

dict1 = dict(row for row in reader1 if len(row) == 2)
dict2 = dict(row for row in reader2 if len(row) == 2)

if sorted(dict1.keys()) == sorted(dict2.keys()):
    if dict1 != dict2:
        print "Not Equ"

dict(reader1) turns the first column into keys of the dict and the second column into values.

The generator expression (row for row in reader1 if len(row) == 2) filters out rows that do not have exactly two columns.

To compare the keys, they must be sorted. Comparing whole dicts (dict1 != dict2) will compare sorted key-value pairs. Because we compared the keys first, we know that any difference between the whole dicts is due to values.


EDIT: To compare individual items, not whole columns:

dict1 = dict(row for row in reader1 if len(row) == 2)
dict2 = dict(row for row in reader2 if len(row) == 2)

for key, val in dict1.iteritems():
    try:
        if dict2[key] != val:
            print "Not Equ"
    except KeyError:
        pass

Upvotes: 1

Dmitry Dubovitsky
Dmitry Dubovitsky

Reputation: 2236

You can parse it into dictionaries and make compare, like this:

>>> d1={'aaa':1, 'abc':2, 'bcd':2, 'adc':3}
>>> d2={'aaa':1, 'abc':0, 'bcd':2, 'adc':4}
>>> d1==d2
False
>>> d2={'aaa':1, 'abc':2, 'bcd':2, 'adc':3}
>>> d1==d2
True

Upvotes: 0

Related Questions