priyanka
priyanka

Reputation: 287

how to install kafka in hadoop cluster

I want to install the latest release of Kafka on my ubuntu Hadoop cluster that contains 1 master nodes and 4 data nodes.

Here are my questions:

Should kafka be installed on all the machines or only on NameNode machine?

What about zookeeper? Should it be installed on all the machines or only  
on NameNode machine?

Please share required document to install kafka and Zookeeper in a Hadoop 5 node cluster

Upvotes: 2

Views: 1197

Answers (1)

Markon
Markon

Reputation: 4600

The architecture is strictly based on your requirements and on what you have: how powerful your machines are, how much data do they need to process, how many consumers do the Kafka instances need to feed, and so on. In theory you can have 1 kafka instance and 1 zookeeper, but it won't be fault-tolerant - if it fails, you lose data and so on.

You find more information about zookeeper multi-cluster here.

What I would do first is to try to analyze

  • how much data they need to process,
  • how much data they need to "ingest",
  • how powerful your machines are,
  • how many consumers you are going to need,
  • how reliable your machines are

These are just a few factors to consider before starting to build up an infrastructure. If you want to have a rough estimate based on "just" 5 machines, assuming they are all equally powerful and with a good amount of memory (e.g., 32GB per machine), is that you need is to have at least a couple of Kafka nodes and at least 3 machines for Zookeeper (2N + 1) so that if one fails, Zookeeper can handle this failure.

Upvotes: 1

Related Questions