Reputation: 1891
I am new to python
I was trying to read csv files and check any difference also to skip the second row of both files
I started something like this
import sys
def csv_diff(file_f,file_g):
#file_f = sys.argv[1]
#file_g = sys.argv[2]
set_f = set()
set_g = set()
with open(file_f) as f:
line = f.readline().strip()
while line:
set_f.add(line)
line = f.readline().strip()
with open(file_g) as g:
line = g.readline().strip()
while line:
set_g.add(line)
line = g.readline().strip()
diff = set_f - set_g
# print set_f
# print set_g
# print diff
if diff:
#print "Data mismatch between the files"
return False
else:
#print " Data Matches "
return True
But this code not reading the first line
My csv file
File Name : man.csv
Start Time : 2017-02-17T09:46:50
Read Count : 1
Write Count : 0
Filter Count : 0
Skip Count : 1
I am looking to skip the line: Start Time : 2017-02-17T09:46:50
Any easy and better approach?
Upvotes: 1
Views: 482
Reputation: 5479
For each file, you can use readlines() to read all lines, pop out index 1 and convert it to a set, then see if the sets are equal.
def csv_diff(file_f,file_g):
with open(file_f) as f:
textf = f.readlines()
textf.pop(1)
set_f = set(textf)
with open(file_g) as g:
textg = g.readlines()
textg.pop(1)
set_g = set(textg)
if set_f == set_g:
return True
return False
Upvotes: 0
Reputation: 1565
You can try the following if your csv
has many entries and you want to always skip Start Time
. This will also work if your csv
as only 1 entry as well.
def csv_diff(file_1, file_2):
with open(file_1, "r") as f1, open(file_2, "r") as f2:
for line1, line2 in zip(f1, f2):
if line1.startswith("Start Time"):
continue
if line1.strip() != line2.strip():
print(f"The two files '{file_1}' and '{file_2}' do not match!")
return False
print(f"The two files '{file_1}' and '{file_2}' are a match!")
return True
Upvotes: 2
Reputation: 130
Why not just add something simple like:
if not "Start Time" in line:
set_g.add(line)
line = g.readline().strip()
Upvotes: 0