Reputation: 95
I'm creating a Cloud Function in GCP to automatically resize images uploaded to a bucket and transfer them to another bucket. Since the images arrive in batches and one folder might contain hundreds or thousands of images, is it better to incorporate in the code the ability to deal with the multiple files or is better to let cloud functions be triggered on every image uploaded.
Upvotes: 3
Views: 1736
Reputation: 11
I know this is a post from a while ago, but answering in case others are dealing with something similar. I came across this issue as well - and when I logged out the event payload, I realized it's due to some temporary files that got written to GCS that caused the multi-invocation. These transient files are not even visible in the GCS bucket, but they still triggered the cloud function. Solution: you can wrap your code logic inside an IF statement like: if "SomeFileName" in event[“name”]:
This blog has more info https://medium.com/@jenn_wang/event-driven-cloud-function-triggered-multiple-times-how-to-address-it-ed8dc58a14c6
Upvotes: 1
Reputation: 75735
Parallel processing is really powerful with serverless product because it scales up and down automatically according to your workloads.
If you can receive thousands of image in few seconds, the serverless product scalability can have difficulties and you can loose some messages (serverless scale up quickly, but it's not magic!!)
A better solution is to publish the Cloud Storage event in PubSub. Like that you can retry easily the failed messages.
If you continue to increase the number of image, or if you want to optimize cost, I recommend you to have a look on Cloud Run.
You can plug PubSub push subscription to Cloud Run. The power of Cloud Run is the capacity to process several HTTP requests (PubSub push message -> Cloud Storage events) on the same instance, and therefore to process concurrently several image on the same instance. If the conversion process is compute intensive, you can have up to 4 CPUs on a Cloud Run instance.
And, as Cloud Functions, you pay only the number of active (being processing request) instances. With Cloud Functions you can process 1 request at a time, therefore 1 instance per file. With Cloud Run you can process up to 1000 concurrent request and therefore your can reduce up to 1000 time the number of instances, and thus your cost. However, take care of the CPU required for you processing, if it's compute intensive, you can't process 1000 images at the same time.
Upvotes: 2
Reputation: 50830
The
finalize
event is sent when a new object is created (or an existing object is overwritten, and a new generation of that object is created) in the bucket.
A new function will be triggered for each object uploaded. You can try compressing all those images in a ZIP
file on client, upload it so it'll trigger only 1 function, then upload images back to storage after unzipping them. But make sure you don't hit any limits mentioned in the documentation.
Upvotes: 0