Reputation: 2745
I have a use case where there will be stream of data coming and I cannot consume it at the same pace and need a buffer. This can be solved using an SNS-SQS queue. I came to know the Kinesis solves the same purpose, so what is the difference? Why should I prefer (or should not prefer) Kinesis?
Upvotes: 246
Views: 129629
Reputation: 19463
Keep in mind this answer was correct for Jun 2015
After studying the issue for a while, having the same question in mind, I found that SQS (with SNS) is preferred for most use cases unless the order of the messages is important to you (SQS doesn't guarantee FIFO on messages). (EDIT: SQS now do support FIFO queue, but then you can have on 1 consumer at a time for the queue per topic, more about it in fifo-topic-message-ordering)
There are 2 main advantages for Kinesis:
Both advantages can be achieved by using SNS as a fan out to SQS. That means that the producer of the message sends only one message to SNS, Then the SNS fans-out the message to multiple SQSs, one for each consumer application. In this way you can have as many consumers as you want without thinking about sharding capacity.
Moreover, we added one more SQS that is subscribed to the SNS that will hold messages for 14 days. In normal case no one reads from this SQS but in case of a bug that makes us want to rewind the data we can easily read all the messages from this SQS and re-send them to the SNS. While Kinesis only provides a 7 days retention.
In conclusion, SNS+SQSs is much easier and provides most capabilities. IMO you need a really strong case to choose Kinesis over it.
Upvotes: 105
Reputation: 7353
Since these services are constantly changing and improving, I summarize main gaps I found in Kinesis that do not exist in SQS.
This information is correct as of April 2023.
Kinesis enables:
aws.amazon.com/kinesis/data-streams/faqs/
Upvotes: 3
Reputation: 908
In very simple terms, and keeping costs out of the picture, the real intention of SNS-SQS are to make services loosely coupled. And this is only primary reason to use SQS where the order of the msgs are not so important and where you have more control of the messages. If you want a pattern of job queue using an SQS is again much better. Kinesis shouldn't be used be used in such cases because it is difficult to remove messages from kinesis because kinesis replays the whole batch on error. You can also use SQS as a dead letter queue for more control. With kinesis all these are possible but unheard of unless you are really critical of SQS.
If you want a nice partitioning then SQS won't be useful.
Upvotes: -1
Reputation: 1549
Kinesis Use Cases
SQS Use Cases
Upvotes: 12
Reputation: 8274
Another thing: Kinesis can trigger a Lambda, while SQS cannot. So with SQS you either have to provide an EC2 instance to process SQS messages (and deal with it if it fails), or you have to have a scheduled Lambda (which doesn't scale up or down - you get just one per minute).
Edit: This answer is no longer correct. SQS can directly trigger Lambda as of June 2018
https://docs.aws.amazon.com/lambda/latest/dg/with-sqs.html
Upvotes: 20
Reputation: 1751
Semantics of these technologies are different because they were designed to support different scenarios:
Let's understand the difference by example.
Once processing of one item cannot be separated from processing another one, we must have Kinesis semantics in order to handle all the cases safely.
Upvotes: 77
Reputation: 1649
The pricing models are different, so depending on your use case one or the other may be cheaper. Using the simplest case (not including SNS):
Plugging in the current prices and not taking into account the free tier, if you send 1 GB of messages per day at the maximum message size, Kinesis will cost much more than SQS ($10.82/month for Kinesis vs. $0.20/month for SQS). But if you send 1 TB per day, Kinesis is somewhat cheaper ($158/month vs. $201/month for SQS).
Details: SQS charges $0.40 per million requests (64 KB each), so $0.00655 per GB. At 1 GB per day, this is just under $0.20 per month; at 1 TB per day, it comes to a little over $201 per month.
Kinesis charges $0.014 per million requests (25 KB each), so $0.00059 per GB. At 1 GB per day, this is less than $0.02 per month; at 1 TB per day, it is about $18 per month. However, Kinesis also charges $0.015 per shard-hour. You need at least 1 shard per 1 MB per second. At 1 GB per day, 1 shard will be plenty, so that will add another $0.36 per day, for a total cost of $10.82 per month. At 1 TB per day, you will need at least 13 shards, which adds another $4.68 per day, for a total cost of $158 per month.
Upvotes: 21
Reputation: 725
Excerpt from AWS Documentation:
We recommend Amazon Kinesis Streams for use cases with requirements that are similar to the following:
Routing related records to the same record processor (as in streaming MapReduce). For example, counting and aggregation are simpler when all records for a given key are routed to the same record processor.
Ordering of records. For example, you want to transfer log data from the application host to the processing/archival host while maintaining the order of log statements.
Ability for multiple applications to consume the same stream concurrently. For example, you have one application that updates a real-time dashboard and another that archives data to Amazon Redshift. You want both applications to consume data from the same stream concurrently and independently.
Ability to consume records in the same order a few hours later. For example, you have a billing application and an audit application that runs a few hours behind the billing application. Because Amazon Kinesis Streams stores data for up to 7 days, you can run the audit application up to 7 days behind the billing application.
We recommend Amazon SQS for use cases with requirements that are similar to the following:
Messaging semantics (such as message-level ack/fail) and visibility timeout. For example, you have a queue of work items and want to track the successful completion of each item independently. Amazon SQS tracks the ack/fail, so the application does not have to maintain a persistent checkpoint/cursor. Amazon SQS will delete acked messages and redeliver failed messages after a configured visibility timeout.
Individual message delay. For example, you have a job queue and need to schedule individual jobs with a delay. With Amazon SQS, you can configure individual messages to have a delay of up to 15 minutes.
Dynamically increasing concurrency/throughput at read time. For example, you have a work queue and want to add more readers until the backlog is cleared. With Amazon Kinesis Streams, you can scale up to a sufficient number of shards (note, however, that you'll need to provision enough shards ahead of time).
Leveraging Amazon SQS’s ability to scale transparently. For example, you buffer requests and the load changes as a result of occasional load spikes or the natural growth of your business. Because each buffered request can be processed independently, Amazon SQS can scale transparently to handle the load without any provisioning instructions from you.
Upvotes: 52
Reputation: 279
I'll add one more thing nobody else has mentioned -- SQS is several orders of magnitude more expensive.
Upvotes: 5
Reputation: 46879
On the surface they are vaguely similar, but your use case will determine which tool is appropriate. IMO, if you can get by with SQS then you should - if it will do what you want, it will be simpler and cheaper, but here is a better explanation from the AWS FAQ which gives examples of appropriate use-cases for both tools to help you decide:
Upvotes: 65
Reputation: 161
Kinesis solves the problem of map part in a typical map-reduce scenario for streaming data. While SQS doesnt make sure of that. If you have streaming data that needs to be aggregated on a key, kinesis makes sure that all the data for that key goes to a specific shard and the shard can be consumed on a single host making the aggregation on key easier compared to SQS
Upvotes: 13
Reputation: 2135
Kinesis support multiple consumers capabilities that means same data records can be processed at a same time or different time within 24 hrs at different consumers, similar behavior in SQS can be achieved by writing into multiple queues and consumers can read from multiple queues. However writing again into multiple queue will add sub seconds {few milliseconds} latency in system.
Second, Kinesis provides routing capability to selective route data records to different shards using partition key which can be processed by particular EC2 instances and can enable micro batch calculation {Counting & aggregation}.
Working on any AWS software is easy but with SQS is easiest one. With Kinesis, there is a need to provision enough shards ahead of time, dynamically increasing number of shards to manage spike load and decrease to save cost also required to manage. it's pain in Kinesis, No such things are required with SQS. SQS is infinitely scalable.
Upvotes: 61
Reputation: 762
The biggest advantage for me is the fact that Kinesis is a replayable queue, and SQS is not. So you can have multiple consumers of the same messages of Kinesis (or the same consumer at different times) where with SQS, once a message has been ack'd, it's gone from that queue. SQS is better for worker queues because of that.
Upvotes: 44