Denis
Denis

Reputation: 795

Trouble gunzipping .gz files on MacOS

Eurostat european database provides a lot of gzipped files such as https://ec.europa.eu/eurostat/estat-navtree-portlet-prod/BulkDownloadListing?file=data/env_air_gge.tsv.gz

When I download such files and then double-click on them, they are correctly unzipped by the Archive Utility on MacOS. But when I try to use gunzip on MacOS Terminal the destination file is gunzipped as a... gz file (without the extension).

I tried to use GZIP API in a custom Objective-C application (and a custom deflating function designed using zlib): both gives the same result as does gunzip in the Terminal application. On the contrary, using the same C deflating function in a Linux program works perfectly on those gzipped files.

What could be the trouble with zlib library and command line gunzip on MacOS that prevent them from treating correctly some gz files? This problem that has apparently be overcome by the Archive Utility Application included with MacOS...

[UPDATE] This becomes more strange: When I get the gz file from another source (for example https://github.com/dhalperi/cse550-code-data/raw/master/density-peaks/rawdata/sample.csv.gz), everything is OK. So the problem lays in the Eurostat website + MacOS combination!

[[UPDATE]] Solution found:

• in the http response from Eurostat, Content-Type is defined as "application/octet-stream" (and encoding defined as "gzip").

• With this setting, the server gzip the file on the fly, so that it is twice gzipped

• When Content-Type is defined as "application/x-gzip" the server do not gzip it, and the file can be decompressed in one pass...

• Analysis of the server response allows to detect twice gzipped files to know when two unzip are necessary.

Upvotes: 0

Views: 5120

Answers (1)

Mark Adler
Mark Adler

Reputation: 112432

Seems fine to me. How did you download it? What is the size of the file after downloading? I get 2,989,400 bytes.

From the comments, yours is larger. It may have been gzipped multiple times. Archive Utility will keep gunzipping it until it no longer has a gzip header. You would have to do that yourself with the command-line gunzip.

Upvotes: 1

Related Questions