ℕʘʘḆḽḘ
ℕʘʘḆḽḘ

Reputation: 19375

Pandas: how to load a zip file containing multiple txt files?

I have many zip files stored in my path

Each zip file contains three different txt files. For instance, in data1.zip there is:

I need to load datai_c.txt from each zipped file (that is, data1_c.txt, data2_c.txt, data3_c.txt, etc) and concatenate them into a dataframe.

Unfortunately I am unable to do so using read_csv because it only works with a single zipped file.

Any ideas how to do so? Thanks!

Upvotes: 4

Views: 5116

Answers (2)

JD Long
JD Long

Reputation: 60756

So you need some other code to reach into the zip file. Below is modified code from O'Reilly's Python Cookbook

import zipfile
import pandas as pd
## make up some data for example
x = pd.DataFrame({"A": [1, 2], "B": [3, 4]}) 
x.to_csv('a.txt', sep="|", index=False) 
(x * 2).to_csv('b.txt', sep="|", index=False)

with zipfile.ZipFile('zipfile.zip', 'w') as myzip:
    myzip.write('a.txt')
    myzip.write('b.txt')
    for filename in z.namelist( ): print 'File:', filename,
         insideDF = pd.read_csv(StringIO(z.read(filename)))
         df = pd.concat([df, insideDF])
print df

Upvotes: 7

splinter
splinter

Reputation: 3897

You want to work with the patool library as follows:

import patool
import pandas as pd
compression = zipfile.ZIP_DEFLATED
patoolib.extract_archive('mypath/data1.zip', outdir='mypath', interactive=False, verbosity=-1)

store eachtxt file in a DataFrame using read_csv as in: df = pd.read_csv('mypath/data1_a')

and then use pd.concat to concatenate the dataframes in any way you want.

Upvotes: 1

Related Questions