BobbyF
BobbyF

Reputation: 431

read_csv one file from several files in a gzip?

I have several files in my tar.gz zip file. I want to read only one of them into a pandas data frame. Is there any way to do that? Pandas can read a file inside a gz. But seems like there is no way to tell it specifically read one of them if there are several files inside the gz.

Would appreciate any thoughts. Babak

Upvotes: 1

Views: 1126

Answers (2)

xuhdev
xuhdev

Reputation: 9400

If you use pardata, you can do this in one line:

import pardata

data = pardata.load_dataset_from_location('path-to-zip.zip')['table/csv']

The returned data variable should be a dictionary of all csv files in the zip archive.

Disclaimer: I'm one of the main co-authors of pardata.

Upvotes: 0

Nitin Kumar Singh
Nitin Kumar Singh

Reputation: 322

To read a specific file in any compressed folder we just need to give its name or position for e.g to read a specific csv file in a zipped folder we can just open that file and read the content.

from zipfile import ZipFile 
import pandas as pd 
# opening the zip file in READ mode 
with ZipFile("results.zip") as z:
    read = pd.read_csv(z.open(z.infolist()[2].filename))
    print(read)

Here the folder structure of results looks like and I want to read test.csv :

$ data_description.txt sample_submission.csv test.csv train.csv

Upvotes: 2

Related Questions