Reputation: 143
My topic has many partitions, many producers, but only one consumer. The consumer can run only for a short period of time, let's say one minute. During this period, I need to make sure it will consume all the records from all the partitions that were produced before the consumer was initialized, but ALSO the records produced during the minute the consumer was running.
The problem is that can't find the correct partition assignation strategy that guarantee I will get all the partitions. If I use consumer.Subscribe(topic), I will only get some partitions, not all. If I use consumer.Assign(partition) during the initialization, I WILL get all the active partitions, but if a new partition comes along and receives records, I will miss those.
The only solution I have so far is to re-do the assignments periodically (every 10 seconds).
Upvotes: 0
Views: 1160
Reputation: 191701
If I use consumer.Subscribe(topic), I will only get some partitions, not all
It should get all. If you don't get all, then that means you more than likely have some un-closed consumer instance in the same consumer group that has already been assigned other partitions.
You can periodically run kafka-consumer-groups --describe
command to inspect this.
Using assignment doesn't use the consumer group protocol, thus why it would work.
There is no guarantee Kafka can provide that your consumer will read any/all data between two time intervals. You'd need to track this on your own, and may potentially require your consumer instance to run for longer than you expect.
Upvotes: 1