Reputation: 12254
I am using the Google Cloud Storage package for Python to copy a bunch of files from one bucket to another. Basic code is:
from google.cloud.storage.client import Client
def copy_bucket_content (client:Client, source_bucket_name, destination_bucket_name, source_dir):
source_bucket = client.get_bucket(source_bucket_name)
destination_bucket = client.get_bucket(destination_bucket_name)
blobs_to_copy = [blob for blob in source_bucket.list_blobs() if blob.name.startswith(source_dir)]
source_bucket.
for blob in blobs_to_copy:
print ("copying {blob}".format(blob=blob.name))
source_bucket.copy_blob(blob, destination_bucket, blob.name)
When I pass a source_dir
that has many blobs in it the script fails at runtime with:
File "/Users/jamiet/.virtualenvs/hive-data-copy-biEl4iRK/lib/python3.6/site-packages/google/cloud/_http.py", line 293, in api_request
raise exceptions.from_http_response(response)
google.api_core.exceptions.InternalServerError: 500 POST https://www.googleapis.com/storage/v1/b/path/to/blob/copyTo/b/path/to/blob: Backend Error
This invariably occurs after transferring between 50 and 80 blobs (it doesn't fail at the same point each time).
I am assuming that I'm hitting some sort of API request limit. Would that be the case?
If so, how do I get around this? I suppose lifting the restriction would be one way but better would be able to issue just one call to the REST API rather than looping over all the blobs and copying them one at a time. I searched around the GCS python package but didn't find anything that might help.
I assume there's a better way of accomplishing this but I don't know what it is, can anyone help?
Upvotes: 0
Views: 1716
Reputation: 2883
There's no quota restriction regarding this scenario. Error 500 indicates a server side issue. You could use an exponential backoff strategy, according to the Handling errors documentation, as well follow the best practices for uploading data.
Upvotes: 1