nos
nos

Reputation: 20870

Does gcloud storage python client API support parallel composite upload?

The gsutil command has options to optimize upload/download speed for large files. For example

GSUtil:parallel_composite_upload_threshold=150M
GSUtil:sliced_object_download_max_components=8

see this page for reference.

What is the equivalence in the google.cloud.storage python API? I didn't find the relevant parameters in this document.

In general, does the client API and gsutil have one to one correspondence in terms of functionalities?

Upvotes: 6

Views: 1886

Answers (2)

Chris Madden
Chris Madden

Reputation: 2660

Check out the transfer_manager module of the Python Client for Google Cloud Storage or in this sample code. It has methods for file up/download where you pass it the object to copy, set the chunk size and number of workers, and the module takes care of the rest.

It uses the Multipart XML API which allows even higher parallelization than gcloud storage of gsutil, and avoids potential bucket feature interop issues too. I wrote about this and more in my recent blog post "High throughput file transfers with Google Cloud Storage (GCS).

Upvotes: 0

DazWilkin
DazWilkin

Reputation: 40296

I think it's not natively supported.

However (!) if you're willing to decompose files then use threading or multiprocessing, there is a compose method that should help you assemble the parts into one GCS object.

Ironically, gsutil is written in Python but it uses a library gslib to implement parallel uploads. You may be able to use gslib as a template.

Upvotes: 9

Related Questions