Reputation: 9501
I have two csv file I need to compare and then spit out the differnces:
CSV FORMAT:
Name Produce Number
Adam Apple 5
Tom Orange 4
Adam Orange 11
I need to compare the two csv files and then tell me if there is a difference between Adams apples on sheet and sheet 2 and do that for all names and produce numbers. Both CSV files will be formated the same.
Any pointers will be greatly appreciated
Upvotes: 8
Views: 31581
Reputation: 776
I have used csvdiff
$pip install csvdiff
$csvdiff --style=compact col1 a.csv b.csv
Upvotes: 8
Reputation: 10695
If you want to use Python's csv module along with a function generator, you can use nested looping and compare large .csv files. The example below compares each row using a cursory comparision:
import csv
def csv_lazy_get(csvfile):
with open(csvfile) as f:
r = csv.reader(f)
for row in r:
yield row
def csv_cmp_lazy(csvfile1, csvfile2):
gen_2 = csv_lazy_get(csvfile2)
for row_1 in csv_lazy_get(csvfile1):
row_2 = gen_2.next()
print("row_1: ", row_1)
print("row_2: ", row_2)
if row_2 == row_1:
print("row_1 is equal to row_2.")
else:
print("row_1 is not equal to row_2.")
gen_2.close()
Upvotes: 1
Reputation: 142256
If your CSV files aren't so large they'll bring your machine to its knees if you load them into memory, then you could try something like:
import csv
csv1 = list(csv.DictReader(open('file1.csv')))
csv2 = list(csv.DictReader(open('file2.csv')))
set1 = set(csv1)
set2 = set(csv2)
print set1 - set2 # in 1, not in 2
print set2 - set1 # in 2, not in 1
print set1 & set2 # in both
For large files, you could load them into a SQLite3 database and use SQL queries to do the same, or sort by relevant keys and then do a match-merge.
Upvotes: 5
Reputation: 56714
import csv
def load_csv_to_dict(fname, get_key, get_data):
with open(fname, 'rb') as inf:
incsv = csv.reader(inf)
incsv.next() # skip header
return {get_key(row):get_data(row) for row in incsv}
def main():
key = lambda r: tuple(r[0:2])
data = lambda r: int(r[2])
f1 = load_csv_to_dict('file1.csv', key, data)
f2 = load_csv_to_dict('file2.csv', key, data)
f1keys = set(f1.iterkeys())
f2keys = set(f2.iterkeys())
print("Keys in file1 but not file2:")
print(", ".join(str(a)+":"+str(b) for a,b in (f1keys-f2keys)))
print("Keys in file2 but not file1:")
print(", ".join(str(a)+":"+str(b) for a,b in (f2keys-f1keys)))
print("Differing values:")
for k in (f1keys & f2keys):
a,b = f1[k], f2[k]
if a != b:
print("{}:{} {} <> {}".format(k[0],k[1], a, b))
if __name__=="__main__":
main()
Upvotes: 1
Reputation: 5952
Here a start that does not use difflib
. It is really just a point to build from because maybe Adam and apples appear twice on the sheet; can you ensure that is not the case? Should the apples be summed, or is that an error?
import csv
fsock = open('sheet.csv','rU')
rdr = csv.reader(fsock)
sheet1 = {}
for row in rdr:
name, produce, amount = row
sheet1[(name, produce)] = int(amount) # always an integer?
fsock.close()
# repeat the above for the second sheet, then compare
You get the idea?
Upvotes: 0
Reputation: 39522
One of the best utilities for comparing two different files is diff
.
See Python implementation here: Comparing two .txt files using difflib in Python
Upvotes: 1