Reputation: 3035
I have the following Python function to write the given content to a bucket in Cloud Storage:
import gzip
from google.cloud import storage
def upload_to_cloud_storage(json):
"""Write to Cloud Storage."""
# The contents to upload as a JSON string.
contents = json
storage_client = storage.Client()
# Path and name of the file to upload (file doesn't yet exist).
destination = "path/to/name.json.gz"
# Gzip the contents before uploading
with gzip.open(destination, "wb") as f:
f.write(contents.encode("utf-8"))
# Bucket
my_bucket = storage_client.bucket('my_bucket')
# Blob (content)
blob = my_bucket.blob(destination)
blob.content_encoding = 'gzip'
# Write to storage
blob.upload_from_string(contents, content_type='application/json')
However, I receive an error when running the function:
FileNotFoundError: [Errno 2] No such file or directory: 'path/to/name.json.gz'
Highlighting this line as the cause:
with gzip.open(destination, "wb") as f:
I can confirm that the bucket and path both exist although the file itself is new and to be written.
I can also confirm that removing the Gzipping part sees the file successfully written to Cloud Storage.
How can I gzip a new file and upload to Cloud Storage?
Other answers I've used for reference:
Upvotes: 1
Views: 3236
Reputation: 3035
Although @David's answer wasn't complete at the time of solving my problem, it got me on the right track. Here's what I ended up using along with explanations I found out along the way.
import gzip
from google.cloud import storage
from google.cloud.storage import fileio
def upload_to_cloud_storage(json_string):
"""Gzip and write to Cloud Storage."""
storage_client = storage.Client()
bucket = storage_client.bucket('my_bucket')
# Filename (include path)
blob = bucket.blob('path/to/file.json')
# Set blog meta data for decompressive transcoding
blob.content_encoding = 'gzip'
blob.content_type = 'application/json'
writer = fileio.BlobWriter(blob)
# Must write as bytes
gz = gzip.GzipFile(fileobj=writer, mode="wb")
# When writing as bytes we must encode our JSON string.
gz.write(json_string.encode('utf-8'))
# Close connections
gz.close()
writer.close()
We use the GzipFile() class instead of convenience method (compress) to enable us to pass in the mode. When trying to write using w
or wt
you will receive the error:
TypeError: memoryview: a bytes-like object is required, not 'str'
So we must write in binary mode (wb
). This will also enable the .write()
method. When doing so however we need to encode our JSON string. This can be done using str.encode()
and setting it as utf-8
. Failing to do this will also result in the same error.
Finally, I wanted to be able to enable decompressive transcoding where the requester (browser in my case) will receive the uncompressed version of the file when requested. To enable this google.cloud.storage.blob
allows you to set some meta data including content_type and content_encoding so we can can follow best practices.
This sees the JSON object in memory written to your chosen destination in Cloud Storage in a compressed format and decompressed on the fly (without needing to download a gzip archive).
Thanks also to @JohnHanley for the troubleshooting advice.
Upvotes: 3
Reputation: 9721
The best solution is not to write the gzip to a file at all, and directly compress and stream to GCS.
from google.cloud import storage
from google.cloud.storage import fileio
storage_client = storage.Client()
bucket = storage_client.bucket('my_bucket')
blob = bucket.blob('my_object')
writer = fileio.BlobWriter(blob)
gz = gzip.GzipFile(fileobj=writer, mode="w") # use "wb" if bytes
gz.write(contents)
gz.close()
writer.close()
Upvotes: 2