Reputation: 11
Suppose, this is a csv file that I'm reading:
header1, header2, header3
1,2,3
1,2,3
header1, header2, header3
4,5,6
4,5,6
header1, header2, header3
7,8,9
7,8,9
This is what I have tried. It might not be remotely right, but I'm very new to python.
with open("D:\\Python\\Python_Assignments\\DummyDuplicatesCheck\\dummyWithDuplicates.csv","r") as inputfile, open("D:\\Python\\Python_Assignments\\DummyDuplicatesCheck\\dummyWithoutDuplicates.csv","w") as outputfile:
for row in input:
print(row)
if(row[0]!="header1" and row[1]!="header2" and row[2]!="header3"):
output.write(row)
Expected output csv file :
header1, header2, header3
1,2,3
1,2,3
4,5,6
4,5,6
7,8,9
7,8,9
Basically, there shouldn't be any duplicate headers, but duplicate values should be printed
Upvotes: 1
Views: 1604
Reputation: 551
Just use a flag.
print_headers = True
with open("D:\\Python\\Python_Assignments\\DummyDuplicatesCheck\\dummyWithDuplicates.csv","r") as inputfile, open("D:\\Python\\Python_Assignments\\DummyDuplicatesCheck\\dummyWithoutDuplicates.csv","w") as outputfile:
input = inputfile.readlines()
for row in input:
if row[0]=="header1" and row[1]=="header2" and row[2]=="header3":
if print_headers:
outputfile.write(row)
print_headers = False
else:
outputfile.write(row)
Upvotes: 0
Reputation: 89
Try this one:
with open("src.csv","r") as inputfile, open("out.csv","w") as outputfile:
lines = inputfile.readlines()
outputfile.write(lines[0])
for i in range(1, len(lines)):
if lines[i] != lines[0]:
outputfile.write(lines[i])
Upvotes: 1
Reputation: 140246
First note that the filehandle => csv object seems to be missing from your example, so your code seems to compare characters from file instead. Seems that you're missing a step which is to instanciate csv.reader
& csv.writer
objects.
When you read a file line by line and you check row[0]!="header1" and row[1]!="header2" and row[2]!="header3"
, it cannot work because row[0] == 'h'
, row[1] == 'e'
and so on...
Using csv
module, I would read the title line separately using manual iteration on the csv object. Then I would write to output only if it's different.
Like this:
import csv
with open("input.csv","r") as inputfile, open("output.csv","w",newline="") as outputfile:
csv_in = csv.reader(inputfile)
csv_out = csv.writer(outputfile)
title = next(csv_in)
csv_out.writerow(title)
for row in csv_in:
if row != title:
csv_out.writerow(row)
or with writerows
csv_out.writerows(row for row in csv_in if row != title)
Upvotes: 2