Shri Javadekar
Shri Javadekar

Reputation: 327

Kafka consumer get "Marking the coordinator dead" error when using group-ids

I have a Kafka cluster running on Kubernetes (on AWS). Each broker has a corresponding external loadbalancer (ELB) and afaict, Kafka's advertised.listeners have been set appropriately so that the ELB's DNS names get returned when clients query for broker information. Most of the setup is similar to the one mentioned here.

I created a kafka consumer without specifying any group-id. With this consumer, reading messages from a topic worked just fine. However, if I set a group-id when creating the kafka consumer, I get back the following error messages:

2018-01-30    22:04:16,763.763.313055038:kafka.cluster:140735643595584:INFO:74479:Group coordinator for my-group-id is BrokerMetadata(nodeId=2, host=u'a17ee9a8a032411e8a3c902beb474154-867008169.us-west-2.elb.amazonaws.com', port=32402, rack=None)
2018-01-30 22:04:16,763.763.804912567:kafka.coordinator:140735643595584:INFO:74479:Discovered coordinator 2 for group my-group-id
2018-01-30 22:04:16,764.764.270067215:kafka.coordinator.consumer:140735643595584:INFO:74479:Revoking previously assigned partitions set([]) for group my-group-id
2018-01-30 22:04:16,866.866.26291275:kafka.coordinator:140735643595584:INFO:74479:(Re-)joining group my-group-id
2018-01-30 22:04:16,898.898.787975311:kafka.coordinator:140735643595584:INFO:74479:Joined group 'my-group-id' (generation 1) with member_id kafka-python-1.3.5-e31607c2-45ec-4461-8691-260bb84c76ba
2018-01-30 22:04:16,899.899.425029755:kafka.coordinator:140735643595584:INFO:74479:Elected group leader -- performing partition assignments using range
2018-01-30 22:04:16,936.936.614990234:kafka.coordinator:140735643595584:WARNING:74479:Marking the coordinator dead (node 2) for group my-group-id: [Error 15] GroupCoordinatorNotAvailableError.
2018-01-30 22:04:17,069.69.8890686035:kafka.cluster:140735643595584:INFO:74479:Group coordinator for my-group-id is BrokerMetadata(nodeId=2, host=u'my-elb.us-west-2.elb.amazonaws.com', port=32402, rack=None)

my-elb.us-west-2.elb.amazonaws.com:32402 is accessible from the client. I used kafkacat and set my-elb.us-west-2.elb.amazonaws.com:32402 as the broker address, it was able to list topics, consume topics, etc.

Any ideas what might be wrong?

Upvotes: 1

Views: 3623

Answers (2)

Shri Javadekar
Shri Javadekar

Reputation: 327

The problem was with 3 config settings in the server.properties that were set incorrectly.

The default minimum in-sync replicas was 2 (min.insync.replicas=2). However, the internal topic settings had a replication factor of 1 (offsets.topic.replication.factor=1).

When a consumer connected with a group-ip, it's corresponding entry had to be made the __consumer_offsets topic. When, this topic was updated, only a single replica was written. This threw errors that the number of in-sync replicas was below the required number.

org.apache.kafka.common.errors.NotEnoughReplicasException: Number of insync replicas for partition __consumer_offsets-42 is [1], below required minimum [2]

I changed the required number of in-sync replicas to 1 and things started working fine.

Upvotes: 0

Abhimanyu
Abhimanyu

Reputation: 2740

Marking the coordinator dead happens when there is a Network communication error between the Consumer Client and the Coordinator (Also this can happen when the Coordinator dies and the group needs to rebalance). There are a variety of situations (offset commit request, fetch offset, etc) that can cause this issue. So to find the root cause issue you need to set the logging level to trace and debug :

logging.level.org.apache.kafka=TRACE

Upvotes: 1

Related Questions