Reputation: 1534
I'm planning to use DynamoDB whose data needs to be synced to CloudSearch. I understand Lambda can be used, but I want to use Kinesis for that. So the Producer would be DynamoDB, and it would generate the data for stream for each PUT/DELETE in the table.
My design is very straight forward: (Assuming consumer receives records orderly)
I'm having issues figuring out how would KCL ensure ordered delivery of records on consumer end, when multiple shards are there. From the API documentation, here's what I understand
However, if I want to sync the data from DynamoDB to CloudSearch, then I need to make sure that all records are synced in exact same order. Here's where I'm getting confused :
Upvotes: 0
Views: 1829
Reputation: 3599
If my thinking is correct, then how can I ever achieve ordered receive with two shards?
You don't do the synchronisation yourself. Instead you need to carefully think about and choose a partition key so that the partitions formed can be processed independently.
E.g. you're indexing records, and records have an id field. If you can update records with different ids in your search index concurrently then record id would be a suitable field as the partition key.
Using the KCL:
It provides ordering of records, as well as the ability to read and/or replay records in the same order to multiple Amazon Kinesis Applications. The Amazon Kinesis Client Library (KCL) delivers all records for a given partition key to the same record processor, making it easier to build multiple applications reading from the same Amazon Kinesis stream (for example, to perform counting, aggregation, and filtering).
https://aws.amazon.com/kinesis/streams/
Upvotes: 1