wtian
wtian

Reputation: 318

What is the difference between Kinesis and SQS?

I know there is a lot materials online for this question, however I have not found any that can explain this question quite clearly to a rookie like me... Appreciate it if some one can help me understand the key differences between these two services and use cases with real life examples. Thank you!

Upvotes: 12

Views: 4767

Answers (3)

Dos
Dos

Reputation: 2507

I try to give a simple answer based on my practical experience:

  1. Consider SQS as temporary storage service. Use cases:

    • manage data with different queue priorities
    • store data for a limited period of time
    • Lambda DLQ
    • reduce costs with long polling
    • create a FIFO
  2. Consider Kinesis as a collector of large stream of real-time data. Use cases:

    • very very large stream of data from different sources
    • backup of data just enabling Firehose (you get a data lake for free)
    • get statistics at once during the collecting phase integrating Kinesis Analytics
    • have checkpoints to keep track in DynamoDB of records processed/failed

Note: consider that both services can be integrated with Lambda Functions very easily, so there are a plenty of use cases that can be solved both with SQS and Kinesis. Anyway, I tried to list some use cases where I found that one of the two performed peculiarly better than the other. Hope it can be helpful :)

Upvotes: 3

John Rotenstein
John Rotenstein

Reputation: 269091

Amazon SQS is a queue. The basic process is:

  • Messages are sent to the queue. They stay there for up to 14 days.
  • Worker programs can request a message (or up to 10 messages) from the queue.
  • When a message is retrieved from the queue:
    • It stays in the queue but is marked as invisible
    • When the worker has finished processing the message, it tells SQS to delete the message from the queue
    • If the worker does not delete the message within the queue's invisibility timeout period, then the message reappears on the queue for another worker to process
    • The worker can, if desired, periodically tell SQS to keep a message invisible because it is still being processed

Thus, once a message is processed, it is deleted.

In Amazon Kinesis, a message is sent to a stream. The stream is divided into shards (think of them as mini-streams). When a message is received, Kinesis stores the message in sequential order. Then, workers can request a message from the start of the stream, or from a specific spot in the stream. For example, if it has already processed 5 messages, it can ask for the 6th message. The messages are retained in the stream for a period of time (eg 24 hours).

I like to think of it like a film strip — each frame in a film is kept in order. You can play a film from the start, or you can fast-forward to the middle and start playing from there. In addition, you can rewind to an earlier part and watch it. The same is true for a Kinesis stream, and multiple consumers can read from various parts of the stream simultaneously.

So, which to choose?

  • If a message is used once and then discarded, a queue is probably the better choice.
  • If retaining message order is important and/or messages will be used more than once, then a stream is probably better.

Upvotes: 24

E.J. Brennan
E.J. Brennan

Reputation: 46841

This article sums it up pretty nicely, imo:

https://sookocheff.com/post/aws/comparing-kinesis-and-sqs/

but basically, if you don't know which one you need, start with SQS until it can't do what you want. SQS is dead-simple to setup and use, and requires almost no experise to use it well.

Kinesis takes a lot more time and expertise to setup to use, so unless you need it, don't bother - even though it could be used for many of the same things as SQS.

One big difference, with SQS if you have multiple consumers reading from the queue, than each consumer will only ever see thge messages they consume - because other consumers will be blocked from seeing them; with Kinesis, many consumers can access the stream at the same time, and each consumer sees the entire streem - so SQS is good for taking a large number of tasks and doling out pieces to lots of consumers to work on in parallel (among other things), where as with Kinesis multiple consumers could read and see the entire streem and do something with ALL of the data in the stream.

The linked article explains it better than me.

Upvotes: 5

Related Questions