Reputation: 15435
I'm a bit confused on the Topic partitioning in Apache Kafka. So I'm charting down a simple use case and I would like to know what happens in different scenarios. So here it is:
I have a Topic T that has 4 partitions TP1, TP2, TP4 and TP4.
Assume that I have 8 messages M1 to M8. Now when my producer sends these messages to the topic T, how will they be received by the Kafka broker under the following scenarios:
Scenario 1: There is only one kafka broker instance that has Topic T with the afore mentioned partitions.
Scenario 2: There are two kafka broker instances with each node having same Topic T with the afore mentioned partitions.
Now assuming that kafka broker instance 1 goes down, how will the consumers react? I'm assuming that my consumer was reading from broker instance 1.
Upvotes: 0
Views: 1779
Reputation: 1365
I'll answer your questions by walking you through partition replication, because you need to learn about replication to understand the answer.
A single broker is considered the "leader" for a given partition. All produces and consumes occur with the leader. Replicas of the partition are replicated to a configurable amount of other brokers. The leader handles replicating a produce to the other replicas. Other replicas that are caught up to the leader are called "in-sync replicas." You can configure what "caught up" means.
A message is only made available to consumers when it has been committed to all in-sync replicas.
If the leader for a given partition fails, the Kafka coordinator will elect a new leader from the list of in-sync replicas and consumers will begin consuming from this new leader. Consumers will have a few milliseconds of added latency while the new leader is elected. A new coordinator will also be elected automatically if the coordinator fails (this adds more latency, too).
If the topic is configured with no replicas, then when the leader of a given partition fails, consumers can't consume from that partition until the broker that was the leader is brought back online. Or, if it is never brought back online, the data previously produced to that partition will be lost forever.
To answer your question directly:
2
.You may also be interested to learn about acks
in the producer.
In the producer, you can configure acks
such that the produce is acknowledged when:
acks=0
)acks=1
)acks=all
)Further, you can configure the minimum number of in-sync replicas required to commit a produce. Then, in the event when not enough in-sync replicas exist given this configuration, the produce will fail. You can build your producer to handle this failure in different ways: buffer, retry, do nothing, block, etc.
Upvotes: 6