Muruga
Muruga

Reputation: 123

High Level Consumer Failure in Kafka

I have the following Kafka Setup

    Number of producer : 1
    Number of topics : 1
    Number of partitions : 2
    Number of consumers : 3 (with same group id)
    Number of Kafka cluster : none(single Kafka server)
    Zookeeper.session.timeout : 1000
    Consumer Type : High Level Consumer

Producer produces messages without any specific partitioning logic(default partitioning logic). Consumer 1 consumes message continuously. I am abruptly killing consumer 1 and I would except consumer 2 or consumer 3 to consume the messages after the failure of consumer 1.

In some cases rebalance occurs and consumer 2 starts consuming messages. This is perfectly fine. But in some cases either consumer 2 or consumer 3 is not at all consuming. I have to manually kill all the consumers and start all three consumers again. Only after this restart consumer 1 starts consuming again.

Precisely rebalance is successful in some cases while in some cases rebalance is not successful. Is there any configuration that I am missing.

Upvotes: 1

Views: 1798

Answers (1)

Denis Makarenko
Denis Makarenko

Reputation: 2938

Kafka uses Zookeeper to coordinate high level consumers.

From http://kafka.apache.org/documentation.html :

Partition Owner registry

Each broker partition is consumed by a single consumer within a given consumer group. The consumer must establish its ownership of a given partition before any consumption can begin. To establish its ownership, a consumer writes its own id in an ephemeral node under the particular broker partition it is claiming.

/consumers/[group_id]/owners/[topic]/[broker_id-partition_id] --> consumer_node_id (ephemeral node)

There is a known ephemeral nodes quirk that they can linger up to 30 seconds after ZK client suddenly goes down : http://developers.blog.box.com/2012/04/10/a-gotcha-when-using-zookeeper-ephemeral-nodes/

So you may be running into this if you expect consumer 2 and 3 to start reading messages immediately after #1 is terminated.

You can also check that /consumers/[group_id]/owners/[topic]/[broker_id-partition_id] contains correct data after rebalancing.

Upvotes: 2

Related Questions