Hourly data flow from SQS to S3

Question

I have a use case which has to follow the following steps:

Read messages from an AWS SQS queue
Process received data and enhance it with some data obtained from other pull based sources
Make the enhanced data available in AWS S3, in prefixes at an hourly cadence

Basically the major ask is how and where to buffer the data for an hour and then write to S3 once every hour only, and not write as soon as a message is received from SQS. The buffering cannot be done in-memory, as the number of messages received will be very large.

P.S. AWS Firehose is not an option since it doesn't ensure complete de-duplication of data written in S3, i.e. if client side failure occurs while sending write request to S3, the same data maybe written again. We want completely non-duplicate data is S3.

Let me know of solution to this problem, and if there is a pre existing tech stack and/or system that accomplishes this.

Thanks!

Hourly data flow from SQS to S3

Answers (1)

Related Questions