Reputation: 41
I'm a noob coder encountering a problem while parsing a csv file with the Python csv module. The problem is that my output says field values in the row are "None" for all but the first field.
Here's the first row in the ugly csv file that I'm trying to parse (the remaining rows follow the same format):
0,213726,NORTH FORK SLATE CREEK,CAMPGROUND,North Fork Slate Creek Campground | Idaho | Public Lands Information Center | Recreation Search, http://www.publiclands.org/explore/site.php?plicstate=ID&id=2268,NA,NA,NA,NA,(208)839-2211,"Nez Perce National Forest Operating Days: 305<br>Total Capacity: 25<br>
5 campsites at the confluence of Slate Creek and its North Fork. A number of trails form loops in the area. These are open to most traffic, including trail bikes.","From Slate Creek, go 8 miles east on Forest Road 354.",NA,http://www.publiclands.org/explore/reg_nat_forest.php?region=7&forest_name=Nez%20Perce%20National%20Forest,NA,NA,NA,45.6,-116.1,NA,N,0,1103,2058
Here's the code I wrote to parse the csv file (it doesn't work right!):
import csv
#READER SETTINGS
f_path = '/Users/foo'
f_handler = open(f_path, 'rU').read().replace('\n',' ')
my_fieldnames = ['col1', 'col2', 'col3', 'col4', 'col5', 'col6', 'col7',
'col8', 'col9', 'col10', 'col11', 'col12', 'col13', 'col14', 'col15',
'col16', 'col17', 'col18', 'col19', 'col20', 'col21', 'col22', 'col23',
'col24','col25']
f_reader = csv.DictReader(f_handler, fieldnames=my_fieldnames, delimiter=',', dialect=csv.excel)
#NOW I TRY TO PARSE THE CSV FILE
i = 0
for row in f_reader:
print "my first row was %s" % row
i = i + 1
if i > 0:
break
And here is the output. It says all the fields except the first one are blank and I don't know why! Any suggestions would be much appreciated.
my first row was {'col14': None, 'col15': None, 'col16': None,
'col17': None, 'col10': None, 'col11': None, 'col12': None,
'col13': None, 'col18': None, 'col19': None, 'col2': None, 'col8': None,
'col9': None, 'col6': None, 'col7': None, 'col4': None, 'col5': None,
'col3': None, 'col1': '0', 'col25': None, 'col24': None,
'col21': None, 'col20': None, 'col23': None, 'col22': None}
Upvotes: 0
Views: 7323
Reputation: 12152
The universe of things that different software systems call CSV varies a lot. Fortunately Python's excellent CSV module is very good at handling these details, so there is no need for you to handle those things by hand.
Let me emphasize some things used @metaperture's answer, but not explained: You can avoid all guesswork from reading a CSV file in Python by auto-detecting the dialect. Once you nail that part there is not much more that can go wrong.
Let me give you a simple example:
import csv
with open(filename, 'rb') as csvfile:
dialect = csv.Sniffer().sniff(csvfile.read(10024))
csvfile.seek(0)
qreader = csv.reader(csvfile, dialect)
cnt = 0
for item in qreader:
if cnt >0:
#process your data
else:
#the header of the csv file (field names)
cnt = cnt + 1
Upvotes: 3
Reputation: 2463
When you do:
f_handler = open(f_path, 'rU').read().replace('\n',' ')
you're removing all the newlines, which is how csv.excel dialect detects new rows. Since the file has only one row, it will only return once.
Additionally, you're doing:
if i > 0:
break
Which terminates your for loop after the first iteration.
On why they're blank, the default restval is None (see http://docs.python.org/3.2/library/csv.html), so the keys likely aren't matching. Try not including the fieldnames argument, and you'll probably see that your keys in this dialect are along the lines of "col2 ", " col3" or the like.
A cute little wrapper I use:
def iter_trim(dict_iter):
#return (dict(zip([k.strip(" \t\n\r") for k in row.keys()], [v.strip(" \t\n\r") for v in row.values()])) for row in dict_iter)
for row in dict_iter:
try:
d = dict(zip([k.strip(" \t\n\r") for k in row.keys()], [v.strip(" \t\n\r") for v in row.values()]))
yield d
except:
print "row error:"
print row
Example usage:
def csv_iter(filename):
csv_fp = open(filename)
guess_dialect = csv.Sniffer().sniff(csv_fp.read(16384))
csv_fp.seek(0)
csv_reader = csv.DictReader(csv_fp,dialect=guess_dialect)
return iter_trim(csv_reader)
for row in csv_iter("some-file.csv"):
# do something...
print row
Upvotes: 0
Reputation: 45672
Try this:
#!/usr/bin/env python
import csv
my_fieldnames = ['col' + str(i) for i in range(1,26)]
with open('input.csv', 'rb') as csvfile:
my_reader = csv.DictReader(csvfile, fieldnames=my_fieldnames,
delimiter=',', dialect=csv.excel,
quoting=csv.QUOTE_NONE)
for row in my_reader:
for k,v in row.iteritems():
print k, v
output for your first line of input (remember that dictionaries are unordered):
col14 None
col15 None
col16 None
col17 None
col10 NA
col11 (208)839-2211
col12 "Nez Perce National Forest Operating Days: 305<br>Total Capacity: 25<br>
col13 None
col18 None
col19 None
col8 NA
col9 NA
col6 http://www.publiclands.org/explore/site.php?plicstate=ID&id=2268
col7 NA
col4 CAMPGROUND
col5 North Fork Slate Creek Campground | Idaho | Public Lands Information Center | Recreation Search
col2 213726
col3 NORTH FORK SLATE CREEK
col1 0
col25 None
col24 None
col21 None
col20 None
col23 None
col22 None
Upvotes: 3