Eoin Clancy
Eoin Clancy

Reputation: 41

Unzipping a gzip file that contains a csv

I have just hit an endpoint and can pull down a gzip compressed file. I have tried saving it and extracting the csv inside but I keep getting errors around encoding whether I try casting from its current state in binary to utf-8/utf-16.

To write to the saved gzip I write in binary mode:

r = requests.get(url, auth=auth, stream=True)
with gzip.open('file.gz', 'wb') as f:
    f.write(r.content)

Where r.content looks like:

b'PK\x03\x04\x14\x00\x08\x08\x08\x00f\x8dKM\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00A\x00\x00\x00RANKTRACKING_report_created_at_11_10_18_17_41-20181011-174141.csv\xec\xbdk\x8f\xe3V\x96\xae\xf9}\x80\xf9\x0f\ ... '

To extract the file on my machine manually I first have to extract to zip and then I can extract that to get the csv. I have tried the same there but ran into encoding errors there too.

Looking for a way to pull out this csv so I can print lines in python console.

Upvotes: 0

Views: 468

Answers (1)

Mark Adler
Mark Adler

Reputation: 112239

That's not a gzip file. That's a zip file. You are then taking the zip file that you retrieved from the URL, and trying to compress it again as a gzip file. So now you have a zip file inside a gzip file. You have moved one step further away from extracting the CSV contents, as opposed to one step closer.

You need to use zipfile to extract the contents of the zip file that you downloaded.

Upvotes: 2

Related Questions