Benny K
Benny K

Reputation: 2087

Run image processing algorithn in python on S3 bucket

I am novice in aws services.

I have large images stroed in S3 bucekt and I want to iterate on each image and make some basic image processing without downloading them (i.e., directly on the bucket), for example:

import numpy as np
import cv2
#some stuff that I don't know here....

for image in bucket:
   new_image = cv2.blur(image,(3,3))

Appreciate any help

Upvotes: 1

Views: 556

Answers (2)

Marcin
Marcin

Reputation: 238209

Sadly, you can't do it "directly in the bucket". S3 is object storage system, not a regular filesystem, and it does not allow for modifications of its objects in a same way as you can do on a regular filesystem. You basically need to replace the object to modify it.

Depending on the size and nature of your image processing tasks (how long, how much ram, cpu they require) you can process them fairly easy. For example, you could use:

  • S3 Batch to automatically run lambda function for all your images in the bucket.

  • AWS Batch useful to run image processing jobs in parallel, without lambda limitations. But requires a bit more setup than S3 Batch.

  • Custom solutions based on EC2 instances or ECS containers. In most basic solution, you setup one instance which processes the image from the bucket one-by-one.

Upvotes: 1

GProst
GProst

Reputation: 10227

You can't do it without downloading an image. So the process for you would be:

  1. Download an image
  2. Process it
  3. Replace image on s3 bucket with the updated image

Upvotes: 1

Related Questions