Reputation: 19405
I have a zipped archive that contains several csv
files.
For instance, assume myarchive.zip
contains myfile1.csv
, myfile2.csv
, myfile3.csv
In python 2.7
I was able to load iteratively all the myfiles
in pandas
using
import pandas as pd
import zipfile
with zipfile.ZipFile(myarchive.zip, 'r') as zippedyear:
for filename in ['myfile1.csv', 'myfile2.csv', 'myfile3.csv']:
mydf = pd.read_csv(zippedyear.open(filename))
Now doing the same thing with Python 3
throws the error
ParserError: iterator should return strings, not bytes (did you open the file in text mode?)
I am at a loss here. Any idea what is the issue? Thanks!
Upvotes: 4
Views: 3567
Reputation: 403198
Strange indeed, since the only mode you can specify is r/w
(character modes).
Here's a workaround; read the file using file.read
, load the data into a StringIO
buffer, and pass that to read_csv
.
from io import StringIO
with zipfile.ZipFile(myarchive.zip, 'r') as zippedyear:
for filename in ['myfile1.csv', 'myfile2.csv', 'myfile3.csv']:
with zippedyear.open(filename) as f:
mydf = pd.read_csv(io.StringIO(f.read()))
Upvotes: 6