Mechanical snail
Mechanical snail

Reputation: 30647

Convert binary input stream to text mode

I am trying to read a bzip2-compressed CSV file in Python 3.2. For an uncompressed CSV file, this works:

datafile = open('./file.csv', mode='rt')
data = csv.reader(datafile)
for e in data:    # works
    process(e)

The problem is that BZ2File only supports creating a binary stream, and in Python 3, csv.reader accepts only text streams. (The same issue occurs with gzip and zip files.)

datafile = bz2.BZ2File('./file.csv.bz2', mode='r')
data = csv.reader(datafile)
for e in data:    # error
    process(e)

In particular, the indicated line throws the exception _csv.Error: iterator should return strings, not bytes (did you open the file in text mode?).

I've also tried data = csv.reader(codecs.EncodedFile(datafile, 'utf8')), but that doesn't fix the error.

How can I wrap the binary input stream so that it can be used in text mode??

Upvotes: 4

Views: 5516

Answers (1)

vz0
vz0

Reputation: 32923

This works for me:

import codecs, csv
f = codecs.open("file.csv", "r", "utf-8")
g = csv.reader(f)
for e in g:
    print(e)

In the case of BZ2:

import codecs, csv, bz2
f = bz2.BZ2File("./file.csv.bz2", mode="r")
c = codecs.iterdecode(f, "utf-8")
g = csv.reader(c)
for e in g:
    print(e)

Upvotes: 5

Related Questions