Reputation: 287
I want to install the latest release of Kafka on my ubuntu Hadoop cluster that contains 1 master nodes and 4 data nodes.
Here are my questions:
Should kafka be installed on all the machines or only on NameNode machine?
What about zookeeper? Should it be installed on all the machines or only
on NameNode machine?
Please share required document to install kafka and Zookeeper in a Hadoop 5 node cluster
Upvotes: 2
Views: 1197
Reputation: 4600
The architecture is strictly based on your requirements and on what you have: how powerful your machines are, how much data do they need to process, how many consumers do the Kafka instances need to feed, and so on. In theory you can have 1 kafka instance and 1 zookeeper, but it won't be fault-tolerant - if it fails, you lose data and so on.
You find more information about zookeeper multi-cluster here.
What I would do first is to try to analyze
These are just a few factors to consider before starting to build up an infrastructure. If you want to have a rough estimate based on "just" 5 machines, assuming they are all equally powerful and with a good amount of memory (e.g., 32GB per machine), is that you need is to have at least a couple of Kafka nodes and at least 3 machines for Zookeeper (2N + 1) so that if one fails, Zookeeper can handle this failure.
Upvotes: 1