Reputation: 317
I was using the csv module to do some parsing, but I have been confounded by a problem that I couldn't really put my hands on. I think the for
-loop for the csv.reader()
eliminates the row that has been looped through. For example:
import csv
f = open('example.csv', 'rb')
r = csv.reader(f)
for row in r:
print(row[0])
for row in r:
print(row[1])
While the first for
-loop prints out some stuff, the second does not. It would be great if someone could explain what is going on here behind the scenes.
Upvotes: 0
Views: 197
Reputation: 552
It's late so sorry about any mistake, I need to do some brush up on my python skills, but I will try to solve your problem.
From the documentation:
csv.reader(csvfile, dialect='excel', **fmtparams) Return a reader object which will iterate over lines in the given csvfile. csvfile can be any object which supports the iterator protocol and returns a string each time its next() method is called
Usually a reader is positioned in the beginning of the target file and begin reading it line by line every time the user calls next()
, so you should not expect the reader to iterate again over the same file, because it has already iterated over it.
In this example, we read the first 2 lines of a csv file:
import csv
with open('example.csv', 'rb') as csvfile:
r = csv.reader(csvfile)
line = r.next()
line2 = r.next()
print line
print line2
The result is:
['name', 'lastname', 'number']
['name1', 'lastname1', '99999999']
So, from this you can assume that the next() method from the reader returns a list of tokens as strings for each line read from the csv.
From the documentation: For Loops
...any object with an iterable method can be used in a for loop in Python...
...Having an iterable method basically means that the data can be presented in list form, where there's multiple values in an orderly fashion. You can define your own iterables by creating an object with next() and iter() methods...
See this example:
import csv
csvList = []
with open('example.csv', 'rb') as csvfile:
r = csv.reader(csvfile)
for row in r:
csvList.append(row)
print csvList
This outputs the values as a list of strings, including the headers:
[['name', 'lastname', 'number'], ['name1', 'lastname1', '99999999'], ['name2', 'lastname2', '88888888']]
There are some utility methods you can make good use if you intend to parse a csv, like the DictReader()
.
import csv
with open('example.csv', 'rb') as csvfile:
r = csv.DictReader(csvfile)
for row in r:
print(row['name'], row['lastname'], row['number'])
This will print:
('name1', 'lastname1', '99999999')
('name2', 'lastname2', '88888888')
From the documentation:
csv.DictReader(csvfile, fieldnames=None, restkey=None, restval=None, dialect='excel', *args, **kwds): Create an object which operates like a regular reader but maps the information read into a dict whose keys are given by the optional fieldnames parameter.
So if you want to have a list of the different values for each field of your csv file, you could create a dictionary and append each value of the row in each iteration.
import csv
nameDict = {'name':[], 'lastname':[], 'number':[]}
with open('example.csv', 'rb') as csvfile:
r = csv.DictReader(csvfile)
for row in r:
nameDict['name'].append(row['name'])
nameDict['lastname'].append(row['lastname'])
nameDict['number'].append(row['number'])
print nameDict
This produces something like this:
{'lastname': ['lastname1', 'lastname2', 'lastname3', 'lastname4', 'lastname5', 'lastname6', 'lastname7', 'lastname8', 'lastname9', 'lastname10'], 'name': ['name1', 'name2', 'name3', 'name4', 'name5', 'name6', 'name7', 'name8', 'name9', 'name10'], 'number': ['99999999', '88888888', '88888888', '88888888', '88888888', '88888888', '88888888', '88888888', '88888888', '88888888']}
Refer to the module documentation for better ways to parse csv.
Hope this helps you.
Upvotes: 1