Reputation: 327
I have a Kafka cluster running on Kubernetes (on AWS). Each broker has a corresponding external loadbalancer (ELB) and afaict, Kafka's advertised.listeners
have been set appropriately so that the ELB's DNS names get returned when clients query for broker information. Most of the setup is similar to the one mentioned here.
I created a kafka consumer without specifying any group-id. With this consumer, reading messages from a topic worked just fine. However, if I set a group-id when creating the kafka consumer, I get back the following error messages:
2018-01-30 22:04:16,763.763.313055038:kafka.cluster:140735643595584:INFO:74479:Group coordinator for my-group-id is BrokerMetadata(nodeId=2, host=u'a17ee9a8a032411e8a3c902beb474154-867008169.us-west-2.elb.amazonaws.com', port=32402, rack=None)
2018-01-30 22:04:16,763.763.804912567:kafka.coordinator:140735643595584:INFO:74479:Discovered coordinator 2 for group my-group-id
2018-01-30 22:04:16,764.764.270067215:kafka.coordinator.consumer:140735643595584:INFO:74479:Revoking previously assigned partitions set([]) for group my-group-id
2018-01-30 22:04:16,866.866.26291275:kafka.coordinator:140735643595584:INFO:74479:(Re-)joining group my-group-id
2018-01-30 22:04:16,898.898.787975311:kafka.coordinator:140735643595584:INFO:74479:Joined group 'my-group-id' (generation 1) with member_id kafka-python-1.3.5-e31607c2-45ec-4461-8691-260bb84c76ba
2018-01-30 22:04:16,899.899.425029755:kafka.coordinator:140735643595584:INFO:74479:Elected group leader -- performing partition assignments using range
2018-01-30 22:04:16,936.936.614990234:kafka.coordinator:140735643595584:WARNING:74479:Marking the coordinator dead (node 2) for group my-group-id: [Error 15] GroupCoordinatorNotAvailableError.
2018-01-30 22:04:17,069.69.8890686035:kafka.cluster:140735643595584:INFO:74479:Group coordinator for my-group-id is BrokerMetadata(nodeId=2, host=u'my-elb.us-west-2.elb.amazonaws.com', port=32402, rack=None)
my-elb.us-west-2.elb.amazonaws.com:32402
is accessible from the client. I used kafkacat
and set my-elb.us-west-2.elb.amazonaws.com:32402
as the broker address, it was able to list topics, consume topics, etc.
Any ideas what might be wrong?
Upvotes: 1
Views: 3623
Reputation: 327
The problem was with 3 config settings in the server.properties that were set incorrectly.
The default minimum in-sync replicas was 2 (min.insync.replicas=2
). However, the internal topic settings had a replication factor of 1 (offsets.topic.replication.factor=1
).
When a consumer connected with a group-ip, it's corresponding entry had to be made the __consumer_offsets
topic. When, this topic was updated, only a single replica was written. This threw errors that the number of in-sync replicas was below the required number.
org.apache.kafka.common.errors.NotEnoughReplicasException: Number of insync replicas for partition __consumer_offsets-42 is [1], below required minimum [2]
I changed the required number of in-sync replicas to 1 and things started working fine.
Upvotes: 0
Reputation: 2740
Marking the coordinator dead happens when there is a Network communication error between the Consumer Client and the Coordinator (Also this can happen when the Coordinator dies and the group needs to rebalance). There are a variety of situations (offset commit request, fetch offset, etc) that can cause this issue. So to find the root cause issue you need to set the logging level to trace and debug :
logging.level.org.apache.kafka=TRACE
Upvotes: 1