emilly
emilly

Reputation: 10530

Kafka setup strategy for replication?

I have two vm servers (say S1 and S2) and need to install kafka in cluster mode where there will be topic with only one partition and two replicas(one is leader in itself and other is follower ) for reliability.

Got high level idea from this cluster setup Want to confirm If below strategy is correct.

  1. First set up zookeeper as cluster on both nodes for high availability(HA). If I do setup zk on single node only and then that node goes down, complete cluster will be down. Right ? Is it mandatory to use zk in latest kafka version also ? Looks it is must for older version Is Zookeeper a must for Kafka?
  2. Start the kafka broker on both nodes . It can be on same port as it is hosted on different nodes.
  3. Create Topic on any node with partition 1 and replica as two.
  4. zookeeper will select any broker on one node as leader and another as follower
  5. Producer will connect to any broker and start publishing the message.
  6. If leader goes down, zookeeper will select another node as leader automatically . Not sure how replica of 2 will be maintained now as there is only one node live now ?

Is above strategy correct ?

Useful resources

ISR

ISR vs replication factor

Upvotes: 1

Views: 2853

Answers (2)

H.Ç.T
H.Ç.T

Reputation: 3559

First set up zookeeper as cluster on both nodes for high availability(HA). If I do setup zk on single node only and then that node goes down, complete cluster will be down. Right ? Is it mandatory to use zk in latest kafka version also ? Looks it is must for older version Is Zookeeper a must for Kafka?

Answer: Yes. Zookeeper is still must until KIP-500 will be released. Zookeeper is responsible for electing controller, storing metadata about Kafka cluster and managing broker membership (link). Ideally the number of Zookeeper nodes should be at least 3. By this way you can tolerate one node failure. (2 healthy Zookeeper nodes (majority in cluster) are still capable of selecting a controller)) You should also consider to set up Zookeeper cluster on different machines other than the machines that Kafka is installed. Thus the failure of a server won't lead to loss of both Zookeeper and Kafka nodes.

Start the kafka broker on both nodes . It can be on same port as it is hosted on different nodes.

Answer: You should first start Zookeeper cluster, then Kafka cluster. Same ports on different nodes are appropriate.

Create Topic on any node with partition 1 and replica as two.

Answer: Partitions are used for horizontal scalability. If you don't need this, one partition is okay. By having replication factor 2, one of the nodes will be leader and one of the nodes will be follower at any time. But it is not enough for avoiding data loss completely as well as providing HA. You should have at least 3 Kafka brokers, 3 replication factor of topics, min.insync.replicas=2 as broker config and acks=all as producer config in the ideal configuration for avoiding data loss by not compromising HA. (you can check this for more information)

zookeeper will select any broker on one node as leader and another as follower

Answer: Controller broker is responsible for maintaining the leader/follower relationship for all the partitions. One broker will be partition leader and another one will be follower. You can check partition leaders/followers with this command.

bin/kafka-topics.sh --describe --bootstrap-server localhost:9092 --topic my-replicated-topic

Producer will connect to any broker and start publishing the message.

Answer: Yes. Setting only one broker as bootstrap.servers is enough to connect to Kafka cluster. But for redundancy you should provide more than one broker in bootstrap.servers.

bootstrap.servers: A list of host/port pairs to use for establishing the initial connection to the Kafka cluster. The client will make use of all servers irrespective of which servers are specified here for bootstrapping—this list only impacts the initial hosts used to discover the full set of servers. This list should be in the form host1:port1,host2:port2,.... Since these servers are just used for the initial connection to discover the full cluster membership (which may change dynamically), this list need not contain the full set of servers (you may want more than one, though, in case a server is down).


If leader goes down, zookeeper will select another node as leader automatically . Not sure how replica of 2 will be maintained now as there is only one node live now ?

Answer: If Controller broker goes down, Zookeeper will select another broker as new Controller. If broker which is leader of your partition goes down, one of the in-sync-replicas will be the new leader. (Controller broker is responsible for this) But of course, if you have just two brokers then replication won't be possible. That's why you should have at least 3 brokers in your Kafka cluster.

Upvotes: 3

Elad Leev
Elad Leev

Reputation: 938

Yes - ZooKeeper is still needed on Kafka 2.4, but you can read about KIP-500 which plans to remove the dependency on ZooKeeper in the near future and start using the Raft algorithm in order to create the quorum.

As you already understood, if you will install ZK on a single node it will work in a standalone mode and you won't have any resiliency. The classic ZK ensemble consist 3 nodes and it allows you to lose 1 ZK node.

After pointing your Kafka brokers to the right ZK cluster you can start your brokers and the cluster will be up and running.

In your example, I would suggest you to create another node in order to gain better resiliency and met the replication factor that you wanted, while still be able to lose one node without losing data.

Bear in mind that using single partition means that you are bounded to single consumer per Consumer Group. The rest of the consumers will be idle.

I suggest you to read this blog about Kafka Best Practices and how to choose the number of topics/partitions in a Kafka cluster.

Upvotes: 2

Related Questions