Reputation: 217
I have an apache-kafka 0.8 cluster with following setup -
1) 3 brokers all running on same machine
2) One topic with 10 partitions and 3 replicas.
I have 20 producers producing to a single topic.
I have 10 consumers consuming from each partition.
I am testing brokers for fail safety.
When all brokers are up and running, number of messages consumed are equal to number of messages produced.
However, when I test setup by bringing down brokers one-by-one, I observed that more number of messages are being consumed than produced.
What could be the possible reason for the same ?
Upvotes: 1
Views: 738
Reputation: 3978
First a thought:
Unless you have separate disks for each broker, it is highly recommended that you use separate machines for each broker. This is because each disk has a maximum I/O throughput that the brokers want to utilize, and if you have multiple brokers using the same disk, all the brokers will compete for I/O.
How quickly are you bringing the brokers down? Instant kill or graceful shutdown? How much time until the next broker is killed? What is your message acknowledgement level? What is the rate at which you are producing messages?
If you kill a broker too slowly, then the producer might have sent a message to the dying broker, which by a race condition, might have replicated it, but it does not send an acknowledgement to the producer before it dies. This would cause the producer to think that the message was not successfully replicated, and it would then try to send the same message to the new leader. The new leader would believe that the duplicated message is a new message and so add this to the logs.
This is a race condition, and is very unlikely except at high producing rates and at acknowledgement level -1.
Upvotes: 1