Reputation: 3891
Amazon claims their Kinesis streaming product guarantees record ordering.
It provides ordering of records, as well as the ability to read and/or replay records in the same order (...)
Kinesis is composed of Streams that are themselves composed of one or more Shards. Records are stored in these Shards. We can write consumer applications that connect to a Shard and read/replay records in the order they were stored.
But can Kinesis guarantee, out of the box, ordering for the Stream itself without pushing ordering logic to the consumers? How can a consumer read records from multiple Shards of the same Stream, making sure the records are read in the same order they were added to the Stream?
Upvotes: 32
Views: 34198
Reputation: 19793
Kinesis supports ordering per shard but only if you write to the shard using PutRecord
. The ordering is lost if you use batch writing PutRecords
Each data record has a unique sequence number. The sequence number is assigned by Kinesis Data Streams after you call client.putRecord to add the data record to the stream. Sequence numbers for the same partition key generally increase over time; the longer the time period between PutRecord requests, the larger the sequence numbers become.
When puts occur in quick succession, the returned sequence numbers are not guaranteed to increase because the put operations appear essentially as simultaneous to Kinesis Data Streams. To guarantee strictly increasing sequence numbers for the same partition key, use the SequenceNumberForOrdering parameter, as shown in the PutRecord Example code sample.
Whether or not you use SequenceNumberForOrdering, records that Kinesis Data Streams receives through a GetRecords call are strictly ordered by sequence number.
Writes multiple data records into a Kinesis data stream in a single call (also referred to as a PutRecords request). Use this operation to send data into the stream for data ingestion and processing.
The response Records array includes both successfully and unsuccessfully processed records. Kinesis Data Streams attempts to process all records in each PutRecords request. A single record failure does not stop the processing of subsequent records. As a result, PutRecords doesn't guarantee the ordering of records. If you need to read records in the same order they are written to the stream, use PutRecord instead of PutRecords, and write to the same shard.
Upvotes: 2
Reputation: 908
You really don't have to worry about handling the order in your consumers since no matter what they will receive the events in an order if you have set a proper partition key. Make sure your partition key maintains the order while distributing events to kinesis shards evenly. Nothing else have to be done to maintain order.
Upvotes: 0
Reputation: 668
Use partition key as the key of the related data. As an example, if you want to process the user data that is sorted per user, the partition key can be the user id. So, user1:data1, user1:data2, user1: data3 will be processed in order, while the user2:data2 will not be in the same shard.
Upvotes: 0
Reputation: 8887
If you need guaranteed order of all data in the stream you can only have one shard. That, of course, doesn't scale very well. What you need to determine is whether you really need that level of ordered data. Is all the data in the stream related to all the other data? The key is to put data in shards when the data is related. Use multiple shards to allow your data to be processed in parallel. If all related data is together in one shard you can take advantage of the guaranteed ordering. If you really need all the data to be ordered you're just going to have to deal with the limited scaling that necessarily comes with that.
Upvotes: 8
Reputation: 367
Not sure about this though.
But in this i guess they are saying that the ordering is possible between multiple shards.
I hope Data streams means logical grouping of shards. So then if this is true then the ordering is possible i suppose.
Please check and confirm
Upvotes: -2
Reputation: 3891
It seems this is not possible to achieve. Ordering is guaranteed on a shard level, but not across the all stream.
https://brandur.org/kinesis-order
So back to our original question: how can we guarantee that all records are consumed in the same order in which they’re produced? The answer is that we can’t, but that we shouldn’t let that unfortunate reality bother us too much. Once we’ve scaled our stream to multiple shards, there’s no mechanism that we can use to guarantee that records are consumed in order across the whole stream; only within a single shard.
Upvotes: 28