Reputation: 13427
I need to process, at peak, 100s of records per second. Those records are simple JSON bodies and they should be collected and then processed/transformed into a database.
A few questions ...
1) Is Kinesis right for this? Or is SQS better suited?
2) When using kinesis, do I want to use the python examples as shown here: https://aws.amazon.com/blogs/big-data/snakes-in-the-stream-feeding-and-eating-amazon-kinesis-streams-with-python/ or should I be implementing my producer and consumer in KCL? What's the difference?
3) Does Kinesis offer anything to the management of the consumers, or do I just run them on EC2 instances and manage them myself?
4) What is the correct pattern for accessing data - I can't afford to miss any records, so I assume I would be fetching records from "TRIM_HORIZON" and not "LATEST". If so, how do I manage duplicates? In other words, how do my consumers get records from the stream and handle consumers going down, etc and always know they are fetching all the records?
Thanks!
Upvotes: 2
Views: 417
Reputation: 5659
TRIM_HORIZON
. If there are duplicates in your data, your consumers should take care of them by doing some bookkeeping on their own. As for consumers going down etc., KCL handles those cases gracefully.Upvotes: 2