Reputation: 47
I am working on an application that reads and processes events from an AWS Kinesis Stream using the Kinesis Client Library (KCL). I don't want the event producer side to suffer the latency therefore the KinesisAsyncClient was used to send events. However in order for my event processing to work properly, I need to process the evens in the "order I called putRecordAsync" on my producer side. This information is available as a timestamp field inside each Kinesis Record.
Aside from switching to use the blocking synchronous Kinesis client, is there any other solution possible to efficiently sort the streaming events?
Upvotes: 0
Views: 1175
Reputation: 16225
If ordering is important, do not use the async client.
The async client simply uses a thread pool under the covers to make the exact same calls - since it's multithreaded, you cannot guarantee the execution order of those threads, and as a result, you do not have control over the order those records are received by Kinesis.
Now, if latency is really an issue for your producer:
Make sure you're calling PutRecords (instead of PutRecord) where possible - this will definitely save you some network round-trips.
Rather than call the client directly, just put the in-order records into a local queue, and use a background thread to regularly poll from that queue to call PutRecords.
Some other things to keep in mind - if this isn't fast enough to keep your in-process queue close to empty, that indicates you have a large enough data throughput that you'll need multiple threads putting data, and you no longer have exact ordering. If this is the case, I'd strongly suggest providing sequence numbers with your records so you can reorder them if necessary on the consumer side (also consider SQS as a starting point instead of Kinesis in that case)
Upvotes: 0