Reputation: 2378
I am new to kafka , i am running kafka in a single machine as of now. I want to run kafka in an distributed environment on multiple machines. There is no proper documentation for this. Any documentation or suggestion on this will be really helpful.
Upvotes: 5
Views: 4301
Reputation: 63
Adding on to the previous answer by user2720864
Let us assume that Kafka system with below configuration is needed.
7 Kafka nodes
3 Zoo keepers
To achieve this install 7 Kafka instances, in 7 different server/vm(instances), and in each of these instances set a different broker-id, this will let the zookeeper identify the different kafka nodes for bookkeeping, maintenance. broker.id=X (/config/server.properties)
To start zookeepers, you can use 3 of the previous kafka instances or can use new servers to start zookeepers. Once the servers on which zookeepers run are decided, change the /config/server.properties to specify zookeepers.
zookeeper.connect=hostname1:port1,hostname2:port2
In a distributed environment its nice to have 3 zoo keepers. While there is only one zookeeper which acts as a true master, other 2 zookeepers act as fail overs. When the master fails one of the two ZKs will take over as master.
I found this link to be very useful, it helped me clarify a lot of things about kafka architecture.
This is a good reference for all the configurations on the property files in kafka.
Hope this helps!
Upvotes: 4
Reputation: 8171
Basically you need to do the follwing
1) Set up kafka on all the machines
2) Configure the config/server1.properties
properties file to specify an unique id
for each machines. You can do that by setting the broker.id
properties in the config file. e.g. broker.id=1
, broker.id=2
. For every brokers this id should be unique. This is how every node is identified in a kafka cluster.
3) Start kafka in all nodes
You can refer Step 6: Setting up a multi broker cluster from their official quick start page.
Also here is a nice article worth taking a look
Upvotes: 3