Urvishsinh Mahida
Urvishsinh Mahida

Reputation: 1450

Poll periodically for new files in AWS S3 buckets having a lot of file?

I have situation where I need to poll AWS S3 bucket for new files. Also, its not just one bucket. There are ~1000+ buckets and these buckets could have a lot of files. What are the usual strategies / design for such use case. I need to consumer new files on each poll. I cannot delete files from the bucket.

Upvotes: 3

Views: 11606

Answers (2)

Brooks
Brooks

Reputation: 7380

Well, in order to best answer that question, we would need to know what kind of application / architecture is doing the polling and consuming, however the 'AWS' way to do that is to have S3 send out S3 notifications upon creation of each file. The S3 notification contains a reference to the S3 file and can go out to SNS or SQS or even better Lambda which will then trigger the application to spin up, consume the files and then shut down.

Now, if you're going to have a LOT of files, all of those SNS/SQS notifications could get costly and some might then start looking at continuously polling S3 with the S3 SDK/CLI, however you need to keep in mind there are costs associated with the polling as well and you should look at ways to decrease the number of files. For example, if you're using Kinesis Firehose to dump into S3, look at batching. Or you can batch the SQS. Try your best to stick with the event notifications, it's much more resilient.

Upvotes: 2

Mark B
Mark B

Reputation: 200682

Instead of polling, you should subscribe to S3 event notifications: http://docs.aws.amazon.com/AmazonS3/latest/dev/NotificationHowTo.html

These can be delivered to an SNS topic, an SQS queue, or trigger a Lambda function.

Upvotes: 10

Related Questions