Reputation: 11

Want to remove duplicate headers which I am reading from my csv file, but duplicate values do need to be written to csv

Suppose, this is a csv file that I'm reading:

header1, header2, header3
1,2,3
1,2,3
header1, header2, header3
4,5,6
4,5,6
header1, header2, header3
7,8,9
7,8,9

This is what I have tried. It might not be remotely right, but I'm very new to python.

with open("D:\\Python\\Python_Assignments\\DummyDuplicatesCheck\\dummyWithDuplicates.csv","r") as inputfile, open("D:\\Python\\Python_Assignments\\DummyDuplicatesCheck\\dummyWithoutDuplicates.csv","w") as outputfile:

 for row in input:
  print(row)
  if(row[0]!="header1" and row[1]!="header2" and row[2]!="header3"):
   output.write(row)

Expected output csv file :

header1, header2, header3
1,2,3
1,2,3
4,5,6
4,5,6
7,8,9
7,8,9

Basically, there shouldn't be any duplicate headers, but duplicate values should be printed

Upvotes: 1

Answers (3)

Prasanth Ganesan

Reputation: 551

Just use a flag.

print_headers = True    
with open("D:\\Python\\Python_Assignments\\DummyDuplicatesCheck\\dummyWithDuplicates.csv","r") as inputfile, open("D:\\Python\\Python_Assignments\\DummyDuplicatesCheck\\dummyWithoutDuplicates.csv","w") as outputfile:
    input = inputfile.readlines()
    for row in input:
        if row[0]=="header1" and row[1]=="header2" and row[2]=="header3":
            if print_headers:
                outputfile.write(row)
                print_headers = False
        else:
            outputfile.write(row)

Upvotes: 0

amasoudfam

Reputation: 89

Try this one:

with open("src.csv","r") as inputfile, open("out.csv","w") as outputfile:
    lines = inputfile.readlines()
    outputfile.write(lines[0])
    for i in range(1, len(lines)):
        if lines[i] != lines[0]:
            outputfile.write(lines[i])

Upvotes: 1

Jean-François Fabre

Reputation: 140246

First note that the filehandle => csv object seems to be missing from your example, so your code seems to compare characters from file instead. Seems that you're missing a step which is to instanciate csv.reader & csv.writer objects.

When you read a file line by line and you check row[0]!="header1" and row[1]!="header2" and row[2]!="header3", it cannot work because row[0] == 'h', row[1] == 'e' and so on...

Using csv module, I would read the title line separately using manual iteration on the csv object. Then I would write to output only if it's different.

Like this:

import csv

with open("input.csv","r") as inputfile, open("output.csv","w",newline="") as outputfile:
   csv_in = csv.reader(inputfile)
   csv_out = csv.writer(outputfile)
   title = next(csv_in)
   csv_out.writerow(title)
   for row in csv_in:
        if row != title:
             csv_out.writerow(row)

or with writerows

   csv_out.writerows(row for row in csv_in if row != title)

Upvotes: 2

Want to remove duplicate headers which I am reading from my csv file, but duplicate values do need to be written to csv

Answers (3)

Related Questions