Daniel
Daniel

Reputation: 169

Large Scale Processing of S3 Images

I have roughly 80tb of images hosted in an S3 bucket which I need to send to an API for image classification. Once the images are classified, the API will forward the results to another endpoint.

Currently, I am thinking of using boto to interact with S3 and perhaps Apache airflow to download these images in batches and forward them to the classification API, which will forward the results of the classification to a web app for display.

In the future I want to automatically send any new image added to the S3 bucket to the API for classification. To achieve this I am hoping to use AWS lambda and S3 notifications to trigger this function.

Would this be the best practice for such a solution?

Thank you.

Upvotes: 0

Views: 275

Answers (1)

John Rotenstein
John Rotenstein

Reputation: 270114

For your future scenarios, yes, that approach would be sensible:

  • Configure Amazon S3 Events to trigger an AWS Lambda function when a new object is created
  • The Lambda function can download the object (to /tmp/) and call the remote API
  • Make sure the Lambda function deletes the temporary file before exiting since the Lambda container might be reused and there is a 500MB storage limit

Please note that the Lambda function will trigger on a single object, rather than in batches.

Upvotes: 1

Related Questions