Reading CSV files from zip archive with python-3.x

Question

I have a zipped archive that contains several csv files.

For instance, assume myarchive.zip contains myfile1.csv, myfile2.csv, myfile3.csv

In python 2.7 I was able to load iteratively all the myfiles in pandas using

import pandas as pd
import zipfile

with zipfile.ZipFile(myarchive.zip, 'r') as zippedyear:
 for filename in ['myfile1.csv', 'myfile2.csv', 'myfile3.csv']:
     mydf = pd.read_csv(zippedyear.open(filename))

Now doing the same thing with Python 3 throws the error

ParserError: iterator should return strings, not bytes (did you open the file in text mode?)

I am at a loss here. Any idea what is the issue? Thanks!

cs95 · Accepted Answer

Strange indeed, since the only mode you can specify is r/w (character modes).

Here's a workaround; read the file using file.read, load the data into a StringIO buffer, and pass that to read_csv.

from io import StringIO

with zipfile.ZipFile(myarchive.zip, 'r') as zippedyear:
    for filename in ['myfile1.csv', 'myfile2.csv', 'myfile3.csv']:
         with zippedyear.open(filename) as f:
             mydf = pd.read_csv(io.StringIO(f.read()))

Reading CSV files from zip archive with python-3.x

Answers (1)

Related Questions