Developer
Developer

Reputation: 259

kafka and zookepeer high availability configuration

I want to set up a high availability kafka-zookeper in my ecosystem. I have 2 data centers and 3 physical servers in each data center.

Dc1

Server 1 - 1st Kafka Broker

Server 2 - 2nd Kafka Broker

Server 3 - 3rd Kafka Broker

So a Kafka cluster with 3 Brokers

Zookepeer ensemble - 3 zookeeper instance in 3 physical servers

Dc2

Similar configuration as DC1

Now my question is-

  1. By doing the above set up are we ensuring fault tolerance and full HA
  2. Is it preferred to have an active-active set up or active-passive set up and why?
  3. how to mirror data asynchronously across Data centers?

Upvotes: 1

Views: 1271

Answers (1)

OneCricketeer
OneCricketeer

Reputation: 191671

By doing the above set up, are we ensuring fault tolerance and full HA?

Sure, but only per datacenter.

In AWS (and other clouds, I guess), you would additionally have Availability Zones (AZs), which are geographically close datacenters, but are still isolated enough from one another such that a disconnection to one zone wont affect applications distributed over the multiple zones. To get really high availablity, best-practices say you would strech the cluster across AZs

Also, 5 Zookeepers would be preferred, as you can then lose 2 machines and be okay.

Is it preferred to have an active-active set up or active-passive set up and why?

If you are actively mirroring Kafka data to a secondary cluster, then it's not really "passive", IMO

There is no way that I know of to "seamlessly" migrate a Kafka client to a "failover cluster" without actually editing the client configurations to use this "backup" set of bootstrap servers. Plus, if data is sent to one cluster, but another is unavailable for some amount of time, then comes back, you then need your consumer applications to reconcile these differences in topic data.

how to mirror data asynchronously across Data centers?

Built into Kafka, there is MirrorMaker. Other tools exist such as Confluent Replicator, which adds more HA features that MirrorMaker (and similar Kafka mirroring tools) don't currently. MirrorMaker2 makes up these differences

Confluent recently updated their whitepaper that is described as

a practical guide to configuring multiple Apache Kafka clusters so that if a disaster scenario strikes, you have a plan for failover, failback, and ultimately successful recovery

You can download it here

Upvotes: 1

Related Questions