Reputation: 2571
I am researching Kafka for a specific use case I am working on. I have a stream of data that is flowing and I want to process it and publish it to intermediary stages.
At each of these stages (initial and intermediary) Samza tasks would do the processing and re publishing. One of the requirements I have is for me to be able to re-trigger the whole processing pipeline from a specific stage in time whenever I want.
I know that kafka maintains an offset for each of its logs (incoming data). However, does Kafka provide any functionality with which I can map partition offsets to some custom identifier (say timestamp) and use this to re-trigger the whole pipeline from that point on wards?
I have read in multiple places that I can replay the kafka commit log by resetting it the beginning and also going back some N times. But is there a way for me to map these offsets to my own identifier like time stamps and use it as a mechanism to tell from which offset to replay.
Best
Shabir
Upvotes: 0
Views: 1713
Reputation: 4532
you can use commandline tool kafka-consumer-groups to reset offset for consumer group based on timestamp (--to-datetime). See more on the doc page: https://kafka.apache.org/documentation/#basic_ops_consumer_group
The same, of course, can be achieved through the code.
Upvotes: 2