user2250246
user2250246

Reputation: 3967

Preventing message loss with Kafka High Level Consumer 0.8.x

A typical kafka consumer looks like the following:

kafka-broker ---> kafka-consumer ----> downstream-consumer like Elastic-Search

And according to the documentation for Kafka High Level Consumer:

The ‘auto.commit.interval.ms’ setting is how often updates to the consumed offsets are written to ZooKeeper

It seems that there can be message loss if the following two things happen:

  1. Offsets are committed just after some messages are retrieved from kafka brokers.
  2. Downstream consumers (say Elastic-Search) fail to process the most recent batch of messages OR the consumer process itself is killed.

It would perhaps be most ideal if the offsets are not committed automatically based on a time interval but they are committed by an API. This would make sure that the kafka-consumer can signal the committing of offsets only after it receives an acknowledgement from the downstream-consumer that they have successfully consumed the messages. There could be some replay of messages (if kafka-consumer dies before committing offsets) but there would at least be no message loss.

Please let me know if such an API exists in the High Level Consumer.

Note: I am aware of the Low Level Consumer API in 0.8.x version of Kafka but I do not want to manage everything myself when all I need is just one simple API in High Level Consumer.

Ref:

  1. AutoCommitTask.run(), look for commitOffsetsAsync
  2. SubscriptionState.allConsumed()

Upvotes: 3

Views: 1173

Answers (1)

user2250246
user2250246

Reputation: 3967

There is a commitOffsets() API in the High Level Consumer API that can be used to solve this.

Also set option "auto.commit.enable" to "false" so that at no time, the offsets are committed automatically by kafka consumer.

Upvotes: 4

Related Questions