Reputation: 2440
I'm reading the documentation for Kafka and it says here:
For a topic with replication factor N, we will tolerate up to N-1 server failures without losing any messages committed to the log.
http://kafka.apache.org/documentation.html#introduction (It's right above 1.2 Use Cases)
How this is possible? From my understanding the topics under the hood use ZooKeeper which uses Zab (A Paxos-like algorithm). I couldn't find any documentation about Zab asides from this page:
https://cwiki.apache.org/confluence/display/ZOOKEEPER/Zab+vs.+Paxos
Can someone explain to me how they can support N-1 failures. Isn't N-1 literally just everything down asides from the machine itself?
Also, if anyone know any good places to read up or videos on Zab please let me know.
Asides from this http://web.stanford.edu/class/cs347/reading/zab.pdf because I was hoping for something easier.
Thanks
Upvotes: 0
Views: 681
Reputation: 3172
I can help you answer the Kafka/Zookeeper part of your question. I think you're confusing how Kafka and Zookeeper work together.
I think it's probably better to think of Kafka and Zookeeper operating independently, but needing both to work to get the job done. Both Kafka and Zookeeper can fail on their own accord.
Both Kafka and Zookeeper have different rules for what constitutes failure.
I don't know anything about the algorithm you mention that is used in Zookeeper, Zab (A Paxos-like algorithm), but to my understanding this is how Kafka and Zookeeper work together.
Upvotes: 3