Reputation:
It seems that csv.DictReader
skips empty lines, even when restval
is set. Using the following, empty lines in the input file are skipped:
import csv
CSV_FIELDS = ("field1", "field2", "field3")
for row in csv.DictReader(open("f"), fieldnames=CSV_FIELDS, restval=""):
if not row or not row[CSV_FIELDS[0]]:
sys.exit("never reached, why?")
Where file f
is:
1,2,3
a,b,c
Upvotes: 3
Views: 4694
Reputation: 880459
Inside the csv.DictReader class:
# unlike the basic reader, we prefer not to return blanks,
# because we will typically wind up with a dict full of None
# values
while row == []:
row = self.reader.next()
So empty rows are skipped.
If you don't want to skip empty lines, you could instead use csv.reader
.
Another option is to subclass csv.DictReader
:
import csv
CSV_FIELDS = ("field1", "field2", "field3")
class MyDictReader(csv.DictReader):
def next(self):
if self.line_num == 0:
# Used only for its side effect.
self.fieldnames
row = self.reader.next()
self.line_num = self.reader.line_num
d = dict(zip(self.fieldnames, row))
lf = len(self.fieldnames)
lr = len(row)
if lf < lr:
d[self.restkey] = row[lf:]
elif lf > lr:
for key in self.fieldnames[lr:]:
d[key] = self.restval
return d
for row in MyDictReader(open("f", 'rb'), fieldnames=CSV_FIELDS, restval=""):
print(row)
yields
{'field2': '2', 'field3': '3', 'field1': '1'}
{'field2': '', 'field3': '', 'field1': ''}
{'field2': '', 'field3': '', 'field1': ''}
{'field2': 'b', 'field3': 'c', 'field1': 'a'}
Upvotes: 6
Reputation: 251096
Unutbu already pointed out to the reason why this is happening, anyways a quick fix will be replace empty lines with ','
before passing them to DictReader
then restval
will take care of the rest of the things.
CSV_FIELDS = ("field1", "field2", "field3")
with open('test.csv') as f:
lines = (',' if line.isspace() else line for line in f)
for row in csv.DictReader(lines, fieldnames=CSV_FIELDS, restval=""):
print row
#output
{'field2': '2', 'field3': '3', 'field1': '1'}
{'field2': '', 'field3': '', 'field1': ''}
{'field2': '', 'field3': '', 'field1': ''}
{'field2': 'b', 'field3': 'c', 'field1': 'a'}
Update:
In case of multi-line empty values the above code won't do it, in that case you can use csv.reader
like this:
RESTVAL = ''
with open('test.csv') as f:
for row in csv.reader(f, quotechar='"'):
if not row:
# Don't use `dict.fromkeys` if RESTVAL is a mutable object
# {k: RESTVAL for k in CSV_FIELDS}
print dict.fromkeys(CSV_FIELDS, RESTVAL)
else:
print {k: v if v else RESTVAL for k, v in zip(CSV_FIELDS, row)}
If file contains:
1,2,"
4"
a,b,c
then the output will be:
{'field2': '2', 'field3': '\n\n\n4', 'field1': '1'}
{'field2': '', 'field3': '', 'field1': ''}
{'field2': '', 'field3': '', 'field1': ''}
{'field2': 'b', 'field3': 'c', 'field1': 'a'}
Upvotes: 3
Reputation: 2458
This is your file :
1,2,3
,,
,,
a,b,c
I add coma and now he takes two empty lines {'field2': '', 'field3': '', 'field1': ''}
For restval
argument, it just say if you have set fields but one is missing, the other values go to this value.
So you set three fields and there are three values each time. But we talk about 'columns' right here and not lines.
Your lines were empty so he skipped it, unless you specify with comas he needs to take empty values, for dictreader.
Upvotes: 0