user2261062
user2261062

Reputation:

how to correctly handle csv.reader headers

When reading csv files, sometimes the first row (or more than one) are headers that we don't want to include in our data.

If I don't need the data from the headers I just use next before declaring the reader (if more than one row is used for headers I can call next multiple times):

with open('myfile.csv', 'rb') as f:
    next(f)                         #skip first row
    reader = csv.reader(f)
    for row in reader:
        #process my data

Sometimes however I don't want to include the headers in my data but still need their values. In that case I transform the csv.reader into a list and handle the headers separately.

with open('myfile.csv', 'rb') as f:
    reader = list(csv.reader(f))

    my_header = reader.pop(0)   #remove header

    for row in reader:
        #process my data

This works and I'm happy about it. But I'm not sure if it's the "best practice" way of using csv.reader and there are other ways worth exploring.

Upvotes: 1

Views: 6711

Answers (2)

cheetOos
cheetOos

Reputation: 67

A simple way to use your csv file organized with a header line and then the values: csv + DictReader ex:

with open ('myfile.csv', 'r') as csv_file:
csv_reader = csv.DictReader (csv_file)
     for row in csv_reader:
         print (row.get ('column1')) # print the value of column1 without title

With this method, you can ignore your header line and precisely target the data you need, and your code will be cleaner. Give me a return, see you later.

Upvotes: 0

bruno desthuilliers
bruno desthuilliers

Reputation: 77912

It's indeed not the best practice - it reads the whole file in memory for no good reason. The funny part is that there's almost nothing to change to your first snippet to get the headers...

next(iterator) does return the "current" element:

>>> it = iter(["hello", "world"])
>>> next(it)
'hello'
>>> next(it)
'world'
>>> next(it)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration

So all you have to do is

with open('myfile.csv', 'rb') as f:
    reader = csv.reader(f)
    headers = next(reader)
    for row in reader:
        #process my data

FWIW, the way you skip "the first row" in your first snippet is brittle - you're actually skipping the first line, which is not necessarily the first row (some csv format have newlines embeded in rows), so for the "no header" version you actually want:

with open('myfile.csv', 'rb') as f:
    reader = csv.reader(f)
    next(reader) # skip first row
    for row in reader:
        #process my data

Upvotes: 6

Related Questions