Reputation: 43
I have two CSV files with 6 columns each and both have one common column EmpID
(the primary key for comparison). For Example, File1.csv
is:
EmpID1,Name1,Email1,City1,Phone1,Hobby1
120034,Tom Hanks,[email protected],Mumbai,8888999,Fishing
And File2.csv
is
EmpID2,Name2,Email2,City2,Phone2,Hobby2
120034,Tom Hanks,[email protected],Mumbai,8888999,Running
The files need to be compared for differences and only rows and columns that are different should be added into a new output file as
EmpID1,Email1,Email2,Hobby1,Hobby2
120034,[email protected],[email protected],Fishing,Running
Currently I have written the below piece of code in Python. Now I am wondering on how to identify and pick the differences. Any pointers and help will be much appreciated.
import csv
import os
os.getcwd()
os.chdir('filepath')
with open('File1.csv', 'r') as csv1, open('File2.csv', 'r') as csv2:
file1 = csv1.readlines()`
file2 = csv2.readlines()`
with open('OutputFile.csv', 'w') as output:
for line in file1:`
if line not in file2:
output.write(line)
output.close()
csv1.close()
csv2.close()
Upvotes: 0
Views: 749
Reputation: 222902
First read the files to a dict structure, with the 'EMPID' as key pointing to the entire row:
import csv
fieldnames = [] # to store all fieldnames
with open('File1.csv') as f:
cf = csv.DictReader(f, delimiter=',')
data1 = {row['EMPID1']: row for row in cf}
fieldnames.extend(cf.fieldnames)
with open('File2.csv') as f:
cf = csv.DictReader(f, delimiter=',')
data2 = {row['EMPID2']: row for row in cf}
fieldnames.extend(cf.fieldnames)
Then identify all ids that are in both dicts:
ids_to_check = set(data1) & set(data2)
Finally, iterate over the ids and compare the rows themselves
with open('OutputFile.csv', 'w') as f:
cw = csv.DictWriter(f, fieldnames, delimiter=',')
cw.writeheader()
for id in ids_to_check:
diff = compare_dict(data1[id], data2[id], fieldnames)
if diff:
cw.writerow(diff)
Here's the compare_dict
function implementation:
def compare_dict(d1, d2, fields_compare):
fields_compare = set(field.rstrip('12') for field in fields_compare)
if any(d1[k + '1'] != d2[k + '2'] for k in fields_compare):
# they differ, return a new dict with all fields
result = d1.copy()
result.update(d2)
return result
else:
return {}
Upvotes: 1