Raymond
Raymond

Reputation: 115

AWS SQS standard queue or FIFO queue when message can not be duplicated?

We plan to use AWS SQS service to queue events created from web service and then use several workers to process those events. One event can only be processed one time. According to AWS SQS document, AWS SQS standard queue can "occasionally" produce duplicated message but with unlimited throughput. AWS SQS FIFO queue will not produce duplicated message but with throughput limitation of 300 API calls per second (with batchSize=10, equivalent of 3000 messages per second). Our current peak hour traffic is only 80 messages per second. So, both are fine in terms of throughput requirement. But, when I started to use AWS SQS FIFO queue, I found that I need to do extra work like providing extra parameters "MessageGroupId" and "MessageDeduplicationId" or need to enable "ContentBasedDeduplication" setting. So, I am not sure which one is a better solution. We just need the message not duplicated. We don't need the message to be FIFO.

Solution #1: Use AWS SQS FIFO queue. For each message, need to generate a UUID for "MessageGroupId" and "MessageDeduplicationId" parameters.

Solution #2: Use AWS SQS FIFO queue with "ContentBasedDeduplcation" enabled. For each message, need to generate a UUID for "MessageGroupId".

Solution #3: Use AWS SQS standard queue with AWS ElasticCache (either Redis or Memcached). For each message, the "MessageId" field will be saved in the cache server and checked for duplication later on. Existence means this message has been processed. (By the way, how long should the "MessageId" exists in the cache server. AWS SQS document does not mention how far back a message could be duplicated.)

Upvotes: 3

Views: 3535

Answers (2)

ketan vijayvargiya
ketan vijayvargiya

Reputation: 5659

  • My first question would be that why is it even so important that you don't get duplicate messages? An ideal solution would be to use a standard queue and design your workers to be idempotent. For e.g., if the messages contain something like a task-ID and store the completed task's result in a database, ignore those whose task-ID already exists in DB.
  • Don't use receipt-handles for handling application-side deduplication, because those change every time a message is received. In other words, SQS doesn't guarantee same receipt-handle for duplicate messages.
  • If you insist on de-duplication, then you have to use FIFO queue.

Upvotes: 0

Kannaiyan
Kannaiyan

Reputation: 13025

You are making your systems complicated with SQS.

We have moved to Kinesis Streams, It works flawlessly. Here are the benefits we have seen,

  1. Order of Events
  2. Trigger an Event when data appears in stream
  3. Deliver in Batches
  4. Leave the responsibility to handle errors to the receiver
  5. Go Back with time in case of issues Buggier Implementation of the process
  6. Higher performance than SQS

Hope it helps.

Upvotes: 0

Related Questions