Reputation: 30647
I am trying to read a bzip2-compressed CSV file in Python 3.2. For an uncompressed CSV file, this works:
datafile = open('./file.csv', mode='rt')
data = csv.reader(datafile)
for e in data: # works
process(e)
The problem is that BZ2File
only supports creating a binary stream, and in Python 3, csv.reader
accepts only text streams. (The same issue occurs with gzip and zip files.)
datafile = bz2.BZ2File('./file.csv.bz2', mode='r')
data = csv.reader(datafile)
for e in data: # error
process(e)
In particular, the indicated line throws the exception _csv.Error: iterator should return strings, not bytes (did you open the file in text mode?)
.
I've also tried data = csv.reader(codecs.EncodedFile(datafile, 'utf8'))
, but that doesn't fix the error.
How can I wrap the binary input stream so that it can be used in text mode??
Upvotes: 4
Views: 5516
Reputation: 32923
This works for me:
import codecs, csv
f = codecs.open("file.csv", "r", "utf-8")
g = csv.reader(f)
for e in g:
print(e)
In the case of BZ2:
import codecs, csv, bz2
f = bz2.BZ2File("./file.csv.bz2", mode="r")
c = codecs.iterdecode(f, "utf-8")
g = csv.reader(c)
for e in g:
print(e)
Upvotes: 5