Felix Hofstätter
Felix Hofstätter

Reputation: 33

How do I use a cloud function to unzip a large file in cloud storage?

I have a cloud function which is triggered when a zip is uploaded to cloud storage and is supposed to unpack it. However the function runs out of memory, presumably since the unzipped file is too large (~2.2 Gb). I was wondering what my options are for dealing with this problem? I read that it's possible to stream large files into cloud storage but I don't know how to do this from a cloud function or while unzipping. Any help would be appreciated.

Here is the code of the cloud function so far:

 storage_client = storage.Client()
 bucket = storage_client.get_bucket("bucket-name")

 destination_blob_filename = "large_file.zip"

 blob = bucket.blob(destination_blob_filename)
 zipbytes = io.BytesIO(blob.download_as_string())

 if is_zipfile(zipbytes):
      with ZipFile(zipbytes, 'r') as myzip:
           for contentfilename in myzip.namelist():
                contentfile = myzip.read(contentfilename)
                blob = bucket.blob(contentfilename)
                blob.upload_from_string(contentfile)

Upvotes: 2

Views: 4192

Answers (1)

guillaume blaquiere
guillaume blaquiere

Reputation: 75725

Your target process is risky:

Thus, you have 2 successful operation without checksum validation!

Before having Cloud Function or Cloud Run with more memory, you can use Dataflow template to unzip your files

Upvotes: 4

Related Questions