How to pipe through ZIP and have a usable filename inside the archive

Question

I'm looking for a way how to zip a (big) file stored in a google-bucket and write the compressed file to a google-bucket too.

This command-sequence works fast and fine:

gsutil cat gs://bucket/20190515.csv | zip | gsutil cp - gs://bucket/20190515.csv.zip

...but it has the problem that the filename inside the ZIP has the useless name "-".

On the other hand, if I use the sequence:

gsutil cp gs://bucket/20190515.csv .
zip -m 20190515.csv.zip 20190515.csv
gsutil mv 20190515.csv.zip gs://bucket/20190515.csv.zip

...then I get a usable name in the ZIP - but the command takes extremely long and needs a correspondingly large (virtual) hard disk.

dede · Accepted Answer

Thanks to meuh's advice, I now have a solution:

#!/usr/bin/python3
import sys, zipstream
with zipstream.ZipFile(mode='w', compression=zipstream.ZIP_DEFLATED) as z:
    z.write_iter(sys.argv[1], sys.stdin.buffer)
    for chunk in z:
        sys.stdout.buffer.write(chunk)

..stored as streamzip.py. Then the following call:

fn="bucket/20190515.csv"
execCmd("gsutil cat gs://%s | streamzip.py %s | gsutil cp - gs://%s.zip"%(fn, fn.split("/")[-1], fn))

...gives the desired result.

How to pipe through ZIP and have a usable filename inside the archive

Answers (1)

Related Questions