Kshitij Bhadage
Kshitij Bhadage

Reputation: 430

Google cloud storage: Mismatch in CRC32C & MD5 while upload string to GCS

While trying to upload the JSON string and overwrite the existing object in the GCS bucket, getting the below error.

google.api_core.exceptions.BadRequest: 400 POST https://storage.googleapis.com/upload/storage/v1/b/cc-freshdesk/o?uploadType=multipart: {
  "error": {
    "code": 400,
    "message": "Provided CRC32C \"i8Z/Pw==\" doesn't match calculated CRC32C \"mVn0oQ==\".",
    "errors": [
      {
        "message": "Provided CRC32C \"i8Z/Pw==\" doesn't match calculated CRC32C \"mVn0oQ==\".",
        "domain": "global",
        "reason": "invalid"
      },
      {
        "message": "Provided MD5 hash \"6NMASNWhbd4WlIj/tWK4Sw==\" doesn't match calculated MD5 hash \"9H5THzsUBARmhzw5NjjgNw==\".",
        "domain": "global",
        "reason": "invalid"
      }
    ]
  }
}
: ('Request failed with status code', 400, 'Expected one of', <HTTPStatus.OK: 200>)

Find the code snippet below:

storage_client = storage.Client()
bucket = storage_client.bucket(bucket_name)
config_blob = bucket.blob(destination_blob_name)
config_blob.upload_from_string(json.dumps(config_data,indent=4), content_type='text/plain')

Can anyone help me understand why this issue might be occurring.

Upvotes: 5

Views: 3279

Answers (2)

timeau
timeau

Reputation: 43

Just in case if anybody needs this 9 month later. Using of two different blobs is not you typically want. Very often you have to do many reads and writes in both directions. So I would strictly discourage using this approach. You need just to refresh the CRC32 checksum explicitly by calling "reload()":

storage_client = storage.Client()
bucket = storage_client.bucket(bucket_name)
config_blob = bucket.get_blob(destination_blob_name)
config_blob.reload()
config_blob.upload_from_string(json.dumps(config_data,indent=4), 
content_type='text/plain')

Upvotes: 4

jaym
jaym

Reputation: 470

To replicate the error you have encountered:

import json
from google.cloud import storage

client = storage.Client()
bucket = client.get_bucket('some-bucket')

# blob1 object
blob1 = bucket.get_blob('file.json')

# downloads content
blob1_string = blob1.download_as_string()

# converts to dict and update content
blob1_obj = json.loads(blob1_string)
blob1_obj['some-key'] = 'some value'

# upload using same blob instance
blob1.upload_from_string(json.dumps(blob1_obj))

# throws error like this `Provided MD5 hash "Ax9olGoqOSb7Nay2LNkCSQ==\" #doesn't match calculated MD5 hash \"XCMPR0o7NdgmI5zN1fMm6Q==\".",

You're probably using the same blob to download and upload contents. To prevent this error you need to create two instances of blob:

import json
from google.cloud import storage

client = storage.Client()
bucket = client.get_bucket("some-bucket")

# blob1 object -- for downloading contents
blob1 = bucket.get_blob('file.json')

blob1_string = blob1.download_as_string()
# Convert to dictionary
blob1_obj = json.loads(blob1_string)
# Add stuff
blob1_obj['some-key'] = 'some value'

# blob2 object -- for uploading contents
blob2 = bucket.get_blob('file.json')

blob2.upload_from_string(json.dumps(blob1_obj))

# no error

Upvotes: 11

Related Questions