Elyza Agosta
Elyza Agosta

Reputation: 641

Reading csv zipped files in python

I'm trying to get data from a zipped csv file. Is there a way to do this without unzipping the whole files? If not, how can I unzip the files and read them efficiently?

Upvotes: 63

Views: 104978

Answers (10)

Arthur D. Howland
Arthur D. Howland

Reputation: 4557

Thought Yaron had the best answer but thought I would add a code that iterated through multiple files inside a zip folder. It will then append the results:

import os
import pandas as pd
import zipfile

curDir = os.getcwd()
zf = zipfile.ZipFile(curDir + '/targetfolder.zip')
text_files = zf.infolist()
list_ = []

print ("Uncompressing and reading data... ")

for text_file in text_files:
    print(text_file.filename)
    df = pd.read_csv(zf.open(text_file.filename))
    # do df manipulations
    list_.append(df)

df = pd.concat(list_)

Upvotes: 9

gaius_baltar
gaius_baltar

Reputation: 107

zipfile also supports the with statement.

So adding onto yaron's answer of using pandas:

with zipfile.ZipFile('file.zip') as myZip:
    with myZip.open('file.csv') as myZipCsv:
        df = pd.read_csv(myZipCsv) 

Upvotes: 9

adhg
adhg

Reputation: 10853

If you have a file name: my_big_file.csv and you zip it with the same name my_big_file.zip

you may simply do this:

df = pd.read_csv("my_big_file.zip")

Note: check your pandas version first (not applicable for older versions)

enter image description here

Upvotes: 1

hughdbrown
hughdbrown

Reputation: 49003

Supposing you are downloading a zip file that contains a CSV and you don't want to use temporary storage. Here is what a sample implementation looks like:

#!/usr/bin/env python3

from csv import DictReader
from io import TextIOWrapper, BytesIO
from zipfile import ZipFile

import requests

def all_tickers():
    url = "https://simfin.com/api/bulk/bulk.php?dataset=industries&variant=null"
    r = requests.get(url)
    zip_ref = ZipFile(BytesIO(r.content))
    for name in zip_ref.namelist():
        print(name)
        with zip_ref.open(name) as file_contents:
            reader = DictReader(TextIOWrapper(file_contents, 'utf-8'), delimiter=';')
            for item in reader:
                print(item)

This takes care of all python3 bytes/str issues.

Upvotes: 4

Yaron
Yaron

Reputation: 1852

I used the zipfile module to import the ZIP directly to pandas dataframe. Let's say the file name is "intfile" and it's in .zip named "THEZIPFILE":

import pandas as pd
import zipfile

zf = zipfile.ZipFile('C:/Users/Desktop/THEZIPFILE.zip') 
df = pd.read_csv(zf.open('intfile.csv'))

Upvotes: 98

sandeepnaidu gottapu
sandeepnaidu gottapu

Reputation: 59

this is the simplest thing I always use.

import pandas as pd
df = pd.read_csv("Train.zip",compression='zip')

Upvotes: 5

volker238
volker238

Reputation: 2260

If you aren't using Pandas it can be done entirely with the standard lib. Here is Python 3.7 code:

import csv
from io import TextIOWrapper
from zipfile import ZipFile

with ZipFile('yourfile.zip') as zf:
    with zf.open('your_csv_inside_zip.csv', 'r') as infile:
        reader = csv.reader(TextIOWrapper(infile, 'utf-8'))
        for row in reader:
            # process the CSV here
            print(row)

Upvotes: 56

Hari Prasad
Hari Prasad

Reputation: 1364

A quick solution can be using below code!

import pandas as pd

#pandas support zip file reads
df = pd.read_csv("/path/to/file.csv.zip")

Upvotes: 36

Anatoly Alekseev
Anatoly Alekseev

Reputation: 2400

Modern Pandas since version 0.18.1 natively supports compressed csv files: its read_csv method has compression parameter : {'infer', 'gzip', 'bz2', 'zip', 'xz', None}, default 'infer'.

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html

Upvotes: 3

brycem
brycem

Reputation: 593

Yes. You want the module 'zipfile'

You open the zip file itself with zipfile.ZipInfo([filename[, date_time]])

You can then use ZipFile.infolist() to enumerate each file within the zip, and extract it with ZipFile.open(name[, mode[, pwd]])

Upvotes: 5

Related Questions