Reputation: 139
I have .zip archive with filename.xlsx inside it and I want to parse Excel sheet line by line.
How to proper pass filename into pandas.read_excel in this case?
I tried:
import zipfile
import pandas
myzip=zipfile.ZipFile(filename.zip)
for fname in myzip.namelist():
with myzip.open(fname) as from_archive:
with pandas.read_excel(from_archive) as fin:
for line in fin:
....
but it doesn't seem to work, and the result was:
AttributeError: __exit__
Upvotes: 13
Views: 18410
Reputation: 1
Simple way is:
df = pd.read_csv('path to file', compression='zip').
if u need u can to add extra atr: encoding = 'windows-1251' and sep = ''
Upvotes: -2
Reputation: 210832
You can extract your zip-file into a variable in memory and parse it using io.BytesIO
:
import io
from zipfile import ZipFile
import pandas as pd
def read_zip(zip_fn, extract_fn=None):
zf = ZipFile(zip_fn)
if extract_fn:
return zf.read(extract_fn)
else:
return {name:zf.read(name) for name in zf.namelist()}
Usage:
df = pd.read_excel(io.BytesIO(read_zip(r'C:\download\test.xlsx.zip', 'test.xlsx')))
Alternatively you can extract files from the zip-file to disk and parse them as a regular files.
PS there are tons of examples on StackOverflow, showing how to explode zip-file...
Upvotes: 16
Reputation: 390
Using zipfile
import zipfile
archive = zipfile.ZipFile('filename.zip', 'r')
xlfile = archive.open('filename.xlsx')
df = pd.read_excel(xlfile)
Upvotes: 5