Reputation: 19375
I have many zip
files stored in my path
mypath/data1.zip
mypath/data2.zip
Each zip file contains three different txt
files. For instance, in data1.zip
there is:
data1_a.txt
data1_b.txt
data1_c.txt
I need to load datai_c.txt
from each zipped file (that is, data1_c.txt
, data2_c.txt
, data3_c.txt
, etc) and concatenate them into a dataframe.
Unfortunately I am unable to do so using read_csv
because it only works with a single zipped file.
Any ideas how to do so? Thanks!
Upvotes: 4
Views: 5116
Reputation: 60756
So you need some other code to reach into the zip file. Below is modified code from O'Reilly's Python Cookbook
import zipfile
import pandas as pd
## make up some data for example
x = pd.DataFrame({"A": [1, 2], "B": [3, 4]})
x.to_csv('a.txt', sep="|", index=False)
(x * 2).to_csv('b.txt', sep="|", index=False)
with zipfile.ZipFile('zipfile.zip', 'w') as myzip:
myzip.write('a.txt')
myzip.write('b.txt')
for filename in z.namelist( ): print 'File:', filename,
insideDF = pd.read_csv(StringIO(z.read(filename)))
df = pd.concat([df, insideDF])
print df
Upvotes: 7
Reputation: 3897
You want to work with the patool
library as follows:
import patool
import pandas as pd
compression = zipfile.ZIP_DEFLATED
patoolib.extract_archive('mypath/data1.zip', outdir='mypath', interactive=False, verbosity=-1)
store eachtxt file in a DataFrame
using read_csv
as in:
df = pd.read_csv('mypath/data1_a')
and then use pd.concat
to concatenate the dataframes in any way you want.
Upvotes: 1