Reputation: 2327
I'm using python's google-cloud
client to download a file from Google Cloud Storage (GCS), getting the following error:
File "/deploy/app/scanworker/storagehandler/gcshandler.py" line 62 in download_object
blob.download_to_file(out_file)
File "/usr/local/lib/python3.5/dist-packages/google/cloud/storage/blob.py" line 464 in download_to_file self._do_download(transport, file_obj, download_url, headers)
File "/usr/local/lib/python3.5/dist-packages/google/cloud/storage/blob.py" line 418 in _do_download
download.consume(transport)
File "/usr/local/lib/python3.5/dist-packages/google/resumable_media/requests/download.py" line 169 in consume
self._write_to_stream(result)
File "/usr/local/lib/python3.5/dist-packages/google/resumable_media/requests/download.py" line 132 in _write_to_stream [args] [locals]
raise common.DataCorruption(response, msg)
DataCorruption: Checksum mismatch while downloading:
https://www.googleapis.com/download/storage/v1/b/<my-bucket>/o/<my-object>?alt=media
The X-Goog-Hash header indicated an MD5 checksum of:
fdn2kKmS4J6LCN6gfmEUVQ==
but the actual MD5 checksum of the downloaded contents was:
C9+ywW2Dap0gEv5gHoR1UQ==
I use the following code to download the blob from GCS:
bucket_name = '<some-bucket>'
service_account_key = '<path to json credential file>'
with open(service_account_key, 'r') as f:
keyfile = json.load(f)
project_id = keyfile['project_id']
credentials = service_account.Credentials.from_service_account_file(service_account_key)
client = storage.Client(project=project_id,
credentials=credentials)
bucket = client.get_bucket(bucket_name)
blob_name = '<name of blob>'
download_path = "./foo.obj"
blob = bucket.blob(blob_name)
with open(download_path, "w") as out_file:
blob.download_to_file(out_file) # it fails here
Some info:
Also, I cannot seem to reproduce the error on my local desktop, downloading the same files that failed from my Docker container.
Is this an error with the client library? Or could it be a network issue? Tried downloading different files, all giving the same error from Kubernetes. The same code has been running for months without problem, only seeing this error now.
Edit:
Rebuilding the Docker container from the exact same code as before seems to have fixed the problem. I'm still curious to what caused the error in the first place though.
Edit 2: We use circleci to deploy the webapp to production. Now it looks like the image built on circleci fails, while building it locally does seem to work. Since it's contained in a Docker container this is really weird, should not matter where we build it from?
Edit 3:
Logging in to the very same container in kubernetes giving the error above, I tried running gsutil cp gs:/<bucket>/<blob-name> foo.obj
This ran without any problem
Upvotes: 4
Views: 5085
Reputation: 2327
As pointed out in the comment by Mike: This was an issue with version 0.3.0 of google-resumable-media
library. (See the issue here: https://github.com/GoogleCloudPlatform/google-resumable-media-python/issues/34)
Specifying google-resumable-media==0.2.3
in our pip's requirements.txt
did the job!
The reason the error did not appear in the Docker image built from my desktop was that I had cached images with the old version of google-resumable-media
.
Upvotes: 1