user3921265
user3921265

Reputation:

Why does csv.DictReader skip empty lines?

It seems that csv.DictReader skips empty lines, even when restval is set. Using the following, empty lines in the input file are skipped:

import csv
CSV_FIELDS = ("field1", "field2", "field3")
for row in csv.DictReader(open("f"), fieldnames=CSV_FIELDS, restval=""):
    if not row or not row[CSV_FIELDS[0]]:
        sys.exit("never reached, why?")

Where file f is:

1,2,3


a,b,c

Upvotes: 3

Views: 4694

Answers (3)

unutbu
unutbu

Reputation: 880459

Inside the csv.DictReader class:

    # unlike the basic reader, we prefer not to return blanks,
    # because we will typically wind up with a dict full of None
    # values
    while row == []:
        row = self.reader.next()

So empty rows are skipped. If you don't want to skip empty lines, you could instead use csv.reader.

Another option is to subclass csv.DictReader:

import csv
CSV_FIELDS = ("field1", "field2", "field3")

class MyDictReader(csv.DictReader):
    def next(self):
        if self.line_num == 0:
            # Used only for its side effect.
            self.fieldnames
        row = self.reader.next()
        self.line_num = self.reader.line_num

        d = dict(zip(self.fieldnames, row))
        lf = len(self.fieldnames)
        lr = len(row)
        if lf < lr:
            d[self.restkey] = row[lf:]
        elif lf > lr:
            for key in self.fieldnames[lr:]:
                d[key] = self.restval
        return d

for row in MyDictReader(open("f", 'rb'), fieldnames=CSV_FIELDS, restval=""):
    print(row)

yields

{'field2': '2', 'field3': '3', 'field1': '1'}
{'field2': '', 'field3': '', 'field1': ''}
{'field2': '', 'field3': '', 'field1': ''}
{'field2': 'b', 'field3': 'c', 'field1': 'a'}

Upvotes: 6

Ashwini Chaudhary
Ashwini Chaudhary

Reputation: 251096

Unutbu already pointed out to the reason why this is happening, anyways a quick fix will be replace empty lines with ',' before passing them to DictReader then restval will take care of the rest of the things.

CSV_FIELDS = ("field1", "field2", "field3")

with open('test.csv') as f:
    lines = (',' if line.isspace() else line for line in f)
    for row in csv.DictReader(lines, fieldnames=CSV_FIELDS, restval=""):
        print row

#output
{'field2': '2', 'field3': '3', 'field1': '1'}
{'field2': '', 'field3': '', 'field1': ''}
{'field2': '', 'field3': '', 'field1': ''}
{'field2': 'b', 'field3': 'c', 'field1': 'a'}

Update:

In case of multi-line empty values the above code won't do it, in that case you can use csv.reader like this:

RESTVAL = ''

with open('test.csv') as f:
    for row in csv.reader(f, quotechar='"'):
        if not row:
            # Don't use `dict.fromkeys` if RESTVAL is a mutable object
            # {k: RESTVAL for k in CSV_FIELDS}
            print dict.fromkeys(CSV_FIELDS, RESTVAL)
        else:
            print {k: v if v else RESTVAL for k, v in zip(CSV_FIELDS, row)}

If file contains:

1,2,"


4"


a,b,c

then the output will be:

{'field2': '2', 'field3': '\n\n\n4', 'field1': '1'}
{'field2': '', 'field3': '', 'field1': ''}
{'field2': '', 'field3': '', 'field1': ''}
{'field2': 'b', 'field3': 'c', 'field1': 'a'}

Upvotes: 3

Bestasttung
Bestasttung

Reputation: 2458

This is your file :

1,2,3
,,
,,
a,b,c

I add coma and now he takes two empty lines {'field2': '', 'field3': '', 'field1': ''} For restval argument, it just say if you have set fields but one is missing, the other values go to this value.

So you set three fields and there are three values each time. But we talk about 'columns' right here and not lines.

Your lines were empty so he skipped it, unless you specify with comas he needs to take empty values, for dictreader.

Upvotes: 0

Related Questions