Rahul
Rahul

Reputation: 13056

How to handle reprocessing scenarios in AWS Kinesis?

I am exploring AWS Kinesis for a data processing requirement that replaces old batch ETL processing with a stream based approach.

One of the key requirements for this project is the ability to reprocess data in cases when

The scenarios are very nicely documented here for Kafka - https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Streams+Data+%28Re%29Processing+Scenarios.

I have seen the timestamp based ShardIterator in Kinesis and I think a Kafka like resetter-tool can be built using Kinesis APIs but it would be great if something like this already exists. Even if it doesn't, it would be good to learn from those who have solved similar problems.

So, does anyone know of any existing resources, patterns and tools available to do this in Kinesis?

Upvotes: 7

Views: 1740

Answers (1)

Srivignesh KN
Srivignesh KN

Reputation: 452

I have run into scenarios where i want to reprocess the kinesis processed records, I have used Kinesis-VCR for re-processing the kinesis generated records.

Kinesis-VCR records the kinesis streams and maintains a metadata of the files processed by kinesis at a given time.

Later, we can use to re-process/replay the events for any given time range.

Here is the github link for the same.

https://github.com/scopely/kinesis-vcr

Let me know if this works for you.

Thanks & Regards, Srivignesh KN

Upvotes: 1

Related Questions