I am trying to implement kafka into production. Wanted to know why single-node, multiple-broker kafka instance is not preferred. Few people suggested that if multiple brokers are used on single node, they should be allocated separate disk space but the reason to do so is not clear. Can someone please explain the impact of single broker vs multiple broker kafka instance on a single node.

Why single node multiple broker in kafka cluster not preferred?

Reputation: 39930

Every topic, is a particular stream of data (similar to a table in a database). Topics, are split into partitions (as many as you like) where each message within a partition gets an incremental id, known as offset as shown below.

Partition 0:

+---+---+---+-----+
| 0 | 1 | 2 | ... |
+---+---+---+-----+

Partition 1:

+---+---+---+---+----+
| 0 | 1 | 2 | 3 | .. |
+---+---+---+---+----+

Now a Kafka cluster is composed of multiple brokers. Each broker is identified with an ID and can contain certain topic partitions.

Example of 2 topics (each having 3 and 2 partitions respectively):

Broker 1:

+-------------------+
|      Topic 1      |
|    Partition 0    |
|                   |
|                   |
|     Topic 2       |
|   Partition 1     |
+-------------------+

Broker 2:

+-------------------+
|      Topic 1      |
|    Partition 2    |
|                   |
|                   |
|     Topic 2       |
|   Partition 0     |
+-------------------+

Broker 3:

+-------------------+
|      Topic 1      |
|    Partition 1    |
|                   |
|                   |
|                   |
|                   |
+-------------------+

Note that data is distributed (and Broker 3 doesn't hold any data of topic 2).

Topics, should have a replication-factor > 1 (usually 2 or 3) so that when a broker is down, another one can serve the data of a topic. For instance, assume that we have a topic with 2 partitions with a replication-factor set to 2 as shown below:

Broker 1:

+-------------------+
|      Topic 1      |
|    Partition 0    |
|                   |
|                   |
|                   |
+-------------------+

Broker 2:

+-------------------+
|      Topic 1      |
|    Partition 0    |
|                   |
|                   |
|     Topic 1       |
|   Partition 1     |
+-------------------+

Broker 3:

+-------------------+
|      Topic 1      |
|    Partition 1    |
|                   |
|                   |
|                   |
+-------------------+

Now assume that Broker 2 has failed. Broker 1 and 3 can still serve the data for topic 1. So a replication-factor of 3 is always a good idea since it allows for one broker to be taken down for maintenance purposes and also for another one to be taken down unexpectedly. Therefore, Apache-Kafka offers strong durability and fault tolerance guarantees.

Upvotes: 5

sn.anurag

Reputation: 623

If you have multiple brokers on the same node then it's possible to end up with all the partitions of a topic in single node only. If that node fails then the particular topic would become unresponsive.

Upvotes: 0

Abhishek

Reputation: 1225

Like most things, the answer to this question is 'it depends'. Your question is generic in nature. It would help if you can be more specific in terms of which attributes of your system are you interested in - performance, availability etc. From a performance standpoint, having lots of instances on box (node) is fine if its has lot of resources. But it will not help you from a availability perspective i.e. your system will have a single point of failure and is at huge risk if that one node happens to go down (unless you have multiple such high resource nodes at your disposal :-) )

Upvotes: 0

for_stack

Reputation: 23021

If you have multiple brokers on a single node with a single disk, then all brokers have to read from and write to a single disk. That makes the system do lots of random read and random write, and the Kafka cluster will have poor performance.

In contrast, if you have multiple disks on a single node, and each broker read from and write to a different disk, then you can avoid the random read/write problem.

UPDATE

Also, if you have too many brokers on a single machine, the network bandwidth might be a bottleneck. Since all brokers have to share the network bandwidth.

Upvotes: 7

Why single node multiple broker in kafka cluster not preferred?

Answers (4)

Related Questions