Reputation: 153
Suppose I want to have highly available Kafka on production on small deployment. I have to use following configs
min.insync.replicas=2 // Don't want to lose messages in case of 1 broker crash
default.replication.factor=3 // Will let producer write in case of 1 replica disappear with broker crash
Will Kafka start making new replica in case of 1 broker crash and 1 replica gone with it?
Do we have to have at least default.replication.factor number of brokers under any conditions to keep working?
Upvotes: 14
Views: 21734
Reputation: 39930
In order to enable high availability in Kafka you need to take into account the following factors:
1. Replication factor: By default, replication factor is set to 1
. The recommended replication-factor
for production environments is 3
which means that 3 brokers are required.
2. Preferred Leader Election: When a broker is taken down, one of the replicas becomes the new leader for a partition. Once the broker that has failed is up and running again, it has no leader partitions and Kafka restores the information it missed while it was down, and it becomes the partition leader again. Preferred leader election is enabled by default. In order to minimize the risk of losing messages when switching back to the preferred leader you need to set the producer property acks
to all
(obviously this comes at a performance cost).
3. Unclean Leader Election: You can enable unclean leader election in order to allow an out-of-sync replica to become the leader and maintain high availability of a partition. With unclean leader election, messages that were not synced to the new leader are lost. There is a trade-off between consistency and high availability meaning that with unclean leader election disabled, if a broker containing the leader replica for a partition becomes unavailable, and no in-sync replica exists to replace it, the partition becomes unavailable until the leader replica or another in-sync replica is back online.
4. Acknowledgements:
Acknowledgements refer to the number of replicas that commit a new message before the message is acknowledged using acks
property. When acks is set to 0
the message is immediately acknowledged without waiting for other brokers to commit. When set to 1
, the message is acknowledged once the leader commits the message. Configuring acks
to all
provides the highest consistency guarantee but slower writes to the cluster.
5. Minimum in-sync replicas: min.insync.replicas
defines the minimum number o in-sync replicas that must be available for the producer in order to successfully send the messages to the partition.If min.insync.replicas
is set to 2
and acks
is set to all
, each message must be written successfully to at least two replicas. This means that the messages won't be lost, unless both brokers fail (unlikely). If one of the brokers fails, the partition will no longer be available for writes. Again, this is a trade-off between consistency and availability.
Upvotes: 26
Reputation: 20880
Well, you can have replication.factor same as min.insync.replicas
. But there may be some challenges.
As we know that during a broker outage, all partition replicas present on that broker become unavailable. That time availability of affected partitions is determined by the existence and status of their other replicas.
If a partition has no additional replica, the partition becomes totally unavailable. But if a partition has additional replicas that are in-sync, one of these in-sync replicas will become the interim partition leader. If the partition has addition replicas but none are in-sync, we have a choice to make: either we choose to wait for the partition leader to come back online–sacrificing availability — or allow an out-of-sync replica to become the interim partition leader–sacrificing consistency.
So in that case, it becomes for any partition to have an extra in-sync replica available to survive the loss of the partition leader. That implies, that min.insync.replicas should be set to atleast 2.
In order to have a minimum ISR size of 2, replication-factor must be at least 2 as well. However if there are only 2 replicas and one broker is unavailable, ISR size will decrease to 1 below minimum. Hence, it is better to have replication-factor greater than the minimum ISR size (at least 3).
Upvotes: 1