Stevie
Stevie

Reputation: 199

CSV Silently Not Reading All Lines on Python on Windows

I'm trying to read all lines of a TSV file to a list. However, the TSV reader is terminating early and not reading the whole file. I know this because data is only 1/6 of the length of the whole file. No errors are thrown when this happens.

When I manually inspect the line it terminates on (corresponding to the length of data, those lines have tons of Unicode symbols. I thought I could catch a UnicodeDecodeError, but instead of throwing an error, it quits out of reading the whole file entirely. I imagine it's hitting something that's triggering an end-of-file??

What's really throwing me for a loop: the error only occurs when I'm using Python 2.7 on Windows Server 2012. The file reads 100% perfectly on Unix implementations of Python 2.7 using both code snippets below. I'm running this inside Anaconda on both.

Here's what I've tried and neither works:

data = []

with open('data.tsv','r') as infile:
    csvreader = csv.reader((x.replace('\0', '') for x in infile),
    delimiter='\t', quoting=csv.QUOTE_NONE)

    data = list(csvreader)

I also tried reading line by line...

with open('data.tsv','r') as infile:
for line in infile:
    try:
        d = line.split('\t')
        q = d[0].decode('utf-8') #where the unicode symbols are located 
        data.append(d)
    except UnicodeDecodeError:
        continue

Thanks in advance!

Upvotes: 0

Views: 55

Answers (1)

zwer
zwer

Reputation: 25779

As per general suggestion from the documentation:

If csvfile is a file object, it must be opened with the ‘b’ flag on platforms where that makes a difference.

So open your file with:

with open('data.csv', 'rb') as infile:
    csvreader = csv.reader(infile, delimiter='\t', quoting=csv.QUOTE_NONE)
    data = list(csvreader)

Also, you will have to decode your strings if they have unicode data, or just use unicodecsv as a drop-in replacement so you don't have to worry about it.

Upvotes: 1

Related Questions