Reputation: 5794
The gsutil command supports options to enable compression during transport only (with -J
or -j ext
), allowing you to compress during transport only, thereby saving network bandwidth and speeding up the copy itself.
Is there an equivalent way to do this when downloading from GCS to local machine? That is, if I have an uncompressed text file at gs://foo/bar/file.json
, is there some equivalent to -J
that will compress the contents of "file.json" during transport only?
The goal is to speed up a copy from remote to local, and not just for a single file but dozens. I'm already using -m
to do parallel copies, but would like to transmit compressed data to reduce network transfer time.
I didn't find anything relevant in the docs, and including -J
doesn't appear to do anything during downloads. I've tried the following, but the "ETA" numbers printed by gsutil look identical whether -J
is present or absent:
gsutil -cp -J gs://foo/bar/file.json .
Upvotes: 2
Views: 1676
Reputation: 537
This feature is not yet available.
As an alternative, you will need to implement your own solution for compressing, be it an App Engine, Cloud Function or Cloud Run. Your application will need to compress your files while they are on Cloud Storage.
The ideal solution would be to use -m
along with the compressed files. This entails that you're making parallel copies of compressed files. Consider the following structures. If [1] is how you are doing your you are downloading each file individually. If you look at [2], you would only download the compressed files.
[1]
Bucket Foo
├───FooScripts
│ ├───SysWipe.sh
│ └───DropAll.sql
├───barconfig
│ ├───barRecreate.sh
│ └───reGenAll.sql
├───Baz
│ ├───BadBaz.sh
│ └───Drop.sh
...
[2]
Bucket Foo
├───FooScripts
│ ├───SysWipe.sh
│ └───DropAll.sql
│ ├───FooScripts.zip
├───barconfig
│ ├───barRecreate.sh
│ └───reGenAll.sql
│ ├───barconfig.zip
├───Baz
│ ├───BadBaz.sh
│ └───Drop.sh
│ ├───Baz.zip
...
Once your data has been downloaded, you should consider deleting the compressed files as they are no longer needed for your operations and you will be charged for them. Alternatively, you can raise a Feature Request on the Public Issue Tracker, which will be sent to the Cloud Storage team, who can look into the feasibility of this request.
Upvotes: 1