XDProgrammer
XDProgrammer

Reputation: 861

Identify new objects in Amazon S3 at regular intervals

I have logs that are added to s3 bucket from various sources. I want to be able to read those logs base on interval, for example every 5 mins. However, I don't want to scan all objects again, instead I will just need to get all of the new objects added since the last time my process ran. (In this case 5 mins ago)

For now, I solved this using s3 event. When there is a new file added to s3 it triggers lambda and saves the object name on dynamodb. Then, a cron job reads all the contents of that table in dynamodb, process it and deletes right after.

I feel like its an overhead. I just want call it directly from s3 using some sort of delta. I was wondering if this is supported.

Upvotes: 0

Views: 595

Answers (1)

John Rotenstein
John Rotenstein

Reputation: 269901

Rather than using DynamoDB, you could:

  • Configure the Amazon S3 Event to create a message in an Amazon SQS queue when a new file is received
  • Your worker (presumably on an Amazon EC2 instance) can poll the SQS queue for messages (if it is waiting for a message, it can use Long Polling to query the queue so it doesn't ask too often)
  • When a message is received, the worker can process the file and then delete the message from the SQS queue

This is a safe, loosely-coupled process that will handle potential failure in worker by keeping the notification in a queue. If the worker fails to process the message after a certain number of tries, the message can be automatically moved to a Dead Letter Queue for manual investigation.

Upvotes: 2

Related Questions