Dante
Dante

Reputation: 3891

Amazon Kinesis and guaranteed ordering

Amazon claims their Kinesis streaming product guarantees record ordering.

It provides ordering of records, as well as the ability to read and/or replay records in the same order (...)

Kinesis is composed of Streams that are themselves composed of one or more Shards. Records are stored in these Shards. We can write consumer applications that connect to a Shard and read/replay records in the order they were stored.

But can Kinesis guarantee, out of the box, ordering for the Stream itself without pushing ordering logic to the consumers? How can a consumer read records from multiple Shards of the same Stream, making sure the records are read in the same order they were added to the Stream?

Upvotes: 32

Views: 34198

Answers (6)

Leeroy Hannigan
Leeroy Hannigan

Reputation: 19793

Kinesis Ordering

Kinesis supports ordering per shard but only if you write to the shard using PutRecord. The ordering is lost if you use batch writing PutRecords

PutRecord

Each data record has a unique sequence number. The sequence number is assigned by Kinesis Data Streams after you call client.putRecord to add the data record to the stream. Sequence numbers for the same partition key generally increase over time; the longer the time period between PutRecord requests, the larger the sequence numbers become.

When puts occur in quick succession, the returned sequence numbers are not guaranteed to increase because the put operations appear essentially as simultaneous to Kinesis Data Streams. To guarantee strictly increasing sequence numbers for the same partition key, use the SequenceNumberForOrdering parameter, as shown in the PutRecord Example code sample.

Whether or not you use SequenceNumberForOrdering, records that Kinesis Data Streams receives through a GetRecords call are strictly ordered by sequence number.

src

PutRecords

Writes multiple data records into a Kinesis data stream in a single call (also referred to as a PutRecords request). Use this operation to send data into the stream for data ingestion and processing.

The response Records array includes both successfully and unsuccessfully processed records. Kinesis Data Streams attempts to process all records in each PutRecords request. A single record failure does not stop the processing of subsequent records. As a result, PutRecords doesn't guarantee the ordering of records. If you need to read records in the same order they are written to the stream, use PutRecord instead of PutRecords, and write to the same shard.

src

Upvotes: 2

Ankur Kothari
Ankur Kothari

Reputation: 908

You really don't have to worry about handling the order in your consumers since no matter what they will receive the events in an order if you have set a proper partition key. Make sure your partition key maintains the order while distributing events to kinesis shards evenly. Nothing else have to be done to maintain order.

Upvotes: 0

Baris
Baris

Reputation: 668

Use partition key as the key of the related data. As an example, if you want to process the user data that is sorted per user, the partition key can be the user id. So, user1:data1, user1:data2, user1: data3 will be processed in order, while the user2:data2 will not be in the same shard.

Upvotes: 0

Jason Wadsworth
Jason Wadsworth

Reputation: 8887

If you need guaranteed order of all data in the stream you can only have one shard. That, of course, doesn't scale very well. What you need to determine is whether you really need that level of ordered data. Is all the data in the stream related to all the other data? The key is to put data in shards when the data is related. Use multiple shards to allow your data to be processed in parallel. If all related data is together in one shard you can take advantage of the guaranteed ordering. If you really need all the data to be ordered you're just going to have to deal with the limited scaling that necessarily comes with that.

Upvotes: 8

sanketh s
sanketh s

Reputation: 367

enter image description here

Not sure about this though.

But in this i guess they are saying that the ordering is possible between multiple shards.

I hope Data streams means logical grouping of shards. So then if this is true then the ordering is possible i suppose.

Please check and confirm

Upvotes: -2

Dante
Dante

Reputation: 3891

It seems this is not possible to achieve. Ordering is guaranteed on a shard level, but not across the all stream.

https://brandur.org/kinesis-order

So back to our original question: how can we guarantee that all records are consumed in the same order in which they’re produced? The answer is that we can’t, but that we shouldn’t let that unfortunate reality bother us too much. Once we’ve scaled our stream to multiple shards, there’s no mechanism that we can use to guarantee that records are consumed in order across the whole stream; only within a single shard.

Upvotes: 28

Related Questions