Reputation: 7940
I have a Kafka cluster hosted in GKE. Google updates GKE nodes in weekly basis and whenever this happens Kafka becomes temporarily unavailable and this causes massive error/rebalance to get it backup to healthy state. Currently we rely on K8 retry to eventually succeed once upgrade completes and cluster becomes available. Is there a way to gracefully handle this type of situation in Kafka or avoid it if possible?
Upvotes: 0
Views: 192
Reputation: 11
You have control over the GKE Node's auto upgrade through "upgrade maintenance window" to decide when upgrades should occur. Based on your business criticality you can configure this option along with K8 retry feature.
Upvotes: 0
Reputation: 650
In order to be able to inform you better, you would have to give us a little more information, what is your setup? Versions of Kube and Kafka? How many Kafka & ZK pods? How are you deploying your Kafka cluster (via a simple helm chart or an operator?) What are the exact symptoms you see when you upgrade your kube cluster? What errors do you get? What is the state of the Kafka cluster etc.? How do you monitor it?
But here are some points worth investigating.
I would strongly encourage you to use take a look at https://strimzi.io/ which can be very helpful if you want to operate Kafka on Kube. It is open source operator and very well documented.
Upvotes: 2