Reputation: 431
I have several files in my tar.gz zip file. I want to read only one of them into a pandas data frame. Is there any way to do that? Pandas can read a file inside a gz. But seems like there is no way to tell it specifically read one of them if there are several files inside the gz.
Would appreciate any thoughts. Babak
Upvotes: 1
Views: 1126
Reputation: 9400
If you use pardata, you can do this in one line:
import pardata
data = pardata.load_dataset_from_location('path-to-zip.zip')['table/csv']
The returned data
variable should be a dictionary of all csv files in the zip archive.
Disclaimer: I'm one of the main co-authors of pardata.
Upvotes: 0
Reputation: 322
To read a specific file in any compressed folder we just need to give its name or position for e.g to read a specific csv file in a zipped folder we can just open that file and read the content.
from zipfile import ZipFile
import pandas as pd
# opening the zip file in READ mode
with ZipFile("results.zip") as z:
read = pd.read_csv(z.open(z.infolist()[2].filename))
print(read)
Here the folder structure of results looks like and I want to read test.csv :
$ data_description.txt sample_submission.csv test.csv train.csv
Upvotes: 2