Bob McBobson
Bob McBobson

Reputation: 904

How to parse the contents of a .csv file into a dict while always skipping the header?

I apologize if this a really basic question, but I have designed a program that requires the user to input a .csv file. I have set up my code to parse this .csv file, remove any characters that could potentially interfere with the rest of the code, and also remove any entry into the dict with the key 'Sequence number' (which is the header in the .csv). However, I know this removal of the header isn't very pythonic, and furthermore, it also assumes that the user will always be uploading a .csv file whose header follows the format 'Sequence number', 'sequence1', 'sequence2'. What should I do so that, no matter how the header is set up in the .csv file, the header is always skipped when it is being parsed into a dict?

import csv

def check_sequences_csv_file(csv_file):
    sequence_dict= {}
    while True:
        try:
            with open(csv_file, "rU") as csvfile:
                sequences = csv.reader(csvfile, dialect= 'excel')
                for line in sequences:
                    candidate_number= line[0]
                    switch= line[1].replace(' ','').replace('\r','').replace('\n','')
                    trigger= line[2].replace(' ','').replace('\r','').replace('\n','')
                    sequence_csv_dict[candidate_number]= [switch, trigger]
                sequence_csv_dict.pop('Sequence number', None)
        except IOError:
            print("Please make sure the file is present in the working directory.")
            continue 
        else:
            break
    return sequence_dict

Upvotes: 1

Views: 53

Answers (1)

Jean-François Fabre
Jean-François Fabre

Reputation: 140188

Using only built-in csv module, the naive approach would be:

with open(csv_file, "rU") as csvfile:
     csv_file.readline()
     sequences = csv.reader(csvfile, dialect= 'excel')

that would work most of the time, BUT if the title line is multi-line you'll get a parsing error. So the best way is to consume the first iteration of the csv.reader object:

with open(csv_file, "rU") as csvfile:         
     sequences = csv.reader(csvfile, dialect= 'excel')
     next(sequences)

(you can save the title line for later: title_row = next(sequences)

then go on with your for loop to read the rest of the rows.

Upvotes: 2

Related Questions