Reputation:
When reading csv files, sometimes the first row (or more than one) are headers that we don't want to include in our data.
If I don't need the data from the headers I just use next
before declaring the reader (if more than one row is used for headers I can call next multiple times):
with open('myfile.csv', 'rb') as f:
next(f) #skip first row
reader = csv.reader(f)
for row in reader:
#process my data
Sometimes however I don't want to include the headers in my data but still need their values. In that case I transform the csv.reader
into a list and handle the headers separately.
with open('myfile.csv', 'rb') as f:
reader = list(csv.reader(f))
my_header = reader.pop(0) #remove header
for row in reader:
#process my data
This works and I'm happy about it. But I'm not sure if it's the "best practice" way of using csv.reader
and there are other ways worth exploring.
Upvotes: 1
Views: 6711
Reputation: 67
A simple way to use your csv file organized with a header line and then the values: csv + DictReader ex:
with open ('myfile.csv', 'r') as csv_file:
csv_reader = csv.DictReader (csv_file)
for row in csv_reader:
print (row.get ('column1')) # print the value of column1 without title
With this method, you can ignore your header line and precisely target the data you need, and your code will be cleaner. Give me a return, see you later.
Upvotes: 0
Reputation: 77912
It's indeed not the best practice - it reads the whole file in memory for no good reason. The funny part is that there's almost nothing to change to your first snippet to get the headers...
next(iterator)
does return the "current" element:
>>> it = iter(["hello", "world"])
>>> next(it)
'hello'
>>> next(it)
'world'
>>> next(it)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
So all you have to do is
with open('myfile.csv', 'rb') as f:
reader = csv.reader(f)
headers = next(reader)
for row in reader:
#process my data
FWIW, the way you skip "the first row" in your first snippet is brittle - you're actually skipping the first line, which is not necessarily the first row (some csv format have newlines embeded in rows), so for the "no header" version you actually want:
with open('myfile.csv', 'rb') as f:
reader = csv.reader(f)
next(reader) # skip first row
for row in reader:
#process my data
Upvotes: 6