Reputation: 73
I'm looking into options for managing a complete data centre failover when a Kafka cluster spans 2 DC's, whilst guaranteeing availability of partitions after the fact. Having the cluster span DC's is preferable to us over the added complexity of Mirrormaker/Replicator and we have a high-speed link available between the two to reduce latency.
Kafka has the concept of rack awareness so the replicas of a topic will be automatically spread between both racks, however I am struggling to see a configuration of replication/min-isr that will not result in less availability/data loss after a rack failover.
Assuming the below simplest scenario, I am looking for a configuration that can handle complete loss of rack 1 without making partitions unavailable/causing data loss (potentially at cost of higher latency if needed):
Everything works fine when all 4 replicas are kept in-sync with acks=all, however because min in-sync replicas is set to 2, there could be the case where both ISR's are in rack 1 only? In the event of a complete rack 1 failure, there would be no ISR's/leader at all so no messages could be produced/read from the topic?
Broker DC/Rack Topic Replica
1 1 1 (in sync)
2 1 2 (in sync) <-- min.insync.replicas=2 + acks=all compatible
3 2 3 (out of sync)
4 2 4 (out of sync) <-- not 2 ISR's in rack 2
Am I missing something or is there some other configuration which will allow for a complete rack failure? Is managing a complete rack failure without the risk of unclean elections or lower availablility possible with Kafka?
Upvotes: 1
Views: 960
Reputation: 1994
scaling up the above topology might not work well because of extra IO; in case you need to increase the cluster to 6 brokers (to scale reads/writes) then you’ll need ISR = 4, replication factor >= 5
Why can't we use replication factor >= 4
?
Upvotes: 0
Reputation: 1082
for a 4 broker cluster stretched over two DCs in order to prevent data loss and survive one DC failure you’ll have to:
Note:
Upvotes: 1