Reputation: 641
I'm trying to get data from a zipped csv file. Is there a way to do this without unzipping the whole files? If not, how can I unzip the files and read them efficiently?
Upvotes: 63
Views: 104978
Reputation: 4557
Thought Yaron had the best answer but thought I would add a code that iterated through multiple files inside a zip folder. It will then append the results:
import os
import pandas as pd
import zipfile
curDir = os.getcwd()
zf = zipfile.ZipFile(curDir + '/targetfolder.zip')
text_files = zf.infolist()
list_ = []
print ("Uncompressing and reading data... ")
for text_file in text_files:
print(text_file.filename)
df = pd.read_csv(zf.open(text_file.filename))
# do df manipulations
list_.append(df)
df = pd.concat(list_)
Upvotes: 9
Reputation: 107
zipfile also supports the with statement.
So adding onto yaron's answer of using pandas:
with zipfile.ZipFile('file.zip') as myZip:
with myZip.open('file.csv') as myZipCsv:
df = pd.read_csv(myZipCsv)
Upvotes: 9
Reputation: 10853
If you have a file name: my_big_file.csv
and you zip it with the same name my_big_file.zip
you may simply do this:
df = pd.read_csv("my_big_file.zip")
Note: check your pandas version first (not applicable for older versions)
Upvotes: 1
Reputation: 49003
Supposing you are downloading a zip file that contains a CSV and you don't want to use temporary storage. Here is what a sample implementation looks like:
#!/usr/bin/env python3
from csv import DictReader
from io import TextIOWrapper, BytesIO
from zipfile import ZipFile
import requests
def all_tickers():
url = "https://simfin.com/api/bulk/bulk.php?dataset=industries&variant=null"
r = requests.get(url)
zip_ref = ZipFile(BytesIO(r.content))
for name in zip_ref.namelist():
print(name)
with zip_ref.open(name) as file_contents:
reader = DictReader(TextIOWrapper(file_contents, 'utf-8'), delimiter=';')
for item in reader:
print(item)
This takes care of all python3 bytes/str issues.
Upvotes: 4
Reputation: 1852
I used the zipfile
module to import the ZIP directly to pandas dataframe.
Let's say the file name is "intfile" and it's in .zip named "THEZIPFILE":
import pandas as pd
import zipfile
zf = zipfile.ZipFile('C:/Users/Desktop/THEZIPFILE.zip')
df = pd.read_csv(zf.open('intfile.csv'))
Upvotes: 98
Reputation: 59
this is the simplest thing I always use.
import pandas as pd
df = pd.read_csv("Train.zip",compression='zip')
Upvotes: 5
Reputation: 2260
If you aren't using Pandas it can be done entirely with the standard lib. Here is Python 3.7 code:
import csv
from io import TextIOWrapper
from zipfile import ZipFile
with ZipFile('yourfile.zip') as zf:
with zf.open('your_csv_inside_zip.csv', 'r') as infile:
reader = csv.reader(TextIOWrapper(infile, 'utf-8'))
for row in reader:
# process the CSV here
print(row)
Upvotes: 56
Reputation: 1364
A quick solution can be using below code!
import pandas as pd
#pandas support zip file reads
df = pd.read_csv("/path/to/file.csv.zip")
Upvotes: 36
Reputation: 2400
Modern Pandas since version 0.18.1 natively supports compressed csv files: its read_csv method has compression parameter : {'infer', 'gzip', 'bz2', 'zip', 'xz', None}, default 'infer'.
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html
Upvotes: 3
Reputation: 593
Yes. You want the module 'zipfile'
You open the zip file itself with zipfile.ZipInfo([filename[, date_time]])
You can then use ZipFile.infolist()
to enumerate each file within the zip, and extract it with ZipFile.open(name[, mode[, pwd]])
Upvotes: 5