CSV reader object not reading entire file [Python]

Question

I am currently working on a project that uses the csv module in python. I have created a separate class to open a pre-existing csv file, modify the data on each line, then save the data to a new csv file.

The original file has 1438 rows, and by placing some test code into the class that handles the writing, it indicates that it is writing 1438 rows to the new csv file. Upon inspection of the file itself, there is infact 1438 rows in the newly created file. However, when I use the standard cvs module in this way:

reader = csv.reader(open('naiveData.csv', 'rb'))

It only goes to row 1410 (and not even then entire row, it ends one and a half indices before the end of the row. I am not sure what may be causing this.

This is how I am accessing the reader:

 for row in reader:                                                          
    print row

Here is the part of the output where it fails:

['UNPM', '16', '2.125', '910', 'athlete', 'enrolled'] 
['UNPM', '14', '2.357', '1020', 'non-athlete', 'enrolled']    
['UNDC', '17', '2.071', '910', 'athlete', 'unenrolled']  
['KINS', '15', '2.6', '910', 'athlete', 'enrolled']  
['PHYS', '16', '1.5', '900', 'non-']

The last list should have ['PHYS', '16', '1.5', '900', 'non-athlete', 'enrolled'].

Any ideas as to what may be causing this? Thanks in advance!

Edit:

Here are the lines in the CVS file around the area the error is occuring:

KINS,15,2.6,910,athlete,enrolled
PHYS,16,1.5,900,non-athlete,enrolled
UNPL,15,3,960,non-athlete,enrolled

abarnert · Accepted Answer

I'm willing to bet this is the problem, although it's hard to be sure when you've only shown us 3 lines of code instead of a reproducible example.

You're doing something like this:

old_reader = csv.reader(open('old.csv', 'rb'))
writer = csv.writer(open('new.csv', 'wb'))
for row in old_reader:
    writer.writerow(transform(row))
new_reader = csv.reader(open('new.csv', 'rb'))
for row in new_reader:
    print row

At the time you open new.csv for reading, you haven't yet closed new.csv for writing. So the last buffer hasn't been flushed to disk. So you can't see it.

But then, when your script finishes, the writer goes out of scope, the file object no longer has any references, so it gets flushed and closed. So when you inspect it from outside of the program, after the script finishes, now it's complete. (Note that this behavior is explicitly not guaranteed; you're just getting lucky.)

And this is why you should never leak files by just putting an open in the middle of an expression. Use a with statement instead. For example:

with open('old.csv', 'rb') as oldf, open('new.csv', 'wb') as newf:
    old_reader = csv.reader(oldf)
    writer = csv.writer(newt)
    for row in old_reader:
        writer.writerow(transform(row))
with open('new.csv', 'rb') as newf:
    new_reader = csv.reader(newf)
    for row in new_reader:
        print row

CSV reader object not reading entire file [Python]

Answers (2)

Related Questions