Reputation: 726
I'm looking for a way how to zip a (big) file stored in a google-bucket and write the compressed file to a google-bucket too.
This command-sequence works fast and fine:
gsutil cat gs://bucket/20190515.csv | zip | gsutil cp - gs://bucket/20190515.csv.zip
...but it has the problem that the filename inside the ZIP has the useless name "-".
On the other hand, if I use the sequence:
gsutil cp gs://bucket/20190515.csv .
zip -m 20190515.csv.zip 20190515.csv
gsutil mv 20190515.csv.zip gs://bucket/20190515.csv.zip
...then I get a usable name in the ZIP - but the command takes extremely long and needs a correspondingly large (virtual) hard disk.
Upvotes: 0
Views: 838
Reputation: 726
Thanks to meuh's advice, I now have a solution:
#!/usr/bin/python3
import sys, zipstream
with zipstream.ZipFile(mode='w', compression=zipstream.ZIP_DEFLATED) as z:
z.write_iter(sys.argv[1], sys.stdin.buffer)
for chunk in z:
sys.stdout.buffer.write(chunk)
..stored as streamzip.py. Then the following call:
fn="bucket/20190515.csv"
execCmd("gsutil cat gs://%s | streamzip.py %s | gsutil cp - gs://%s.zip"%(fn, fn.split("/")[-1], fn))
...gives the desired result.
Upvotes: 3