Ananya Antony
Ananya Antony

Reputation: 358

Can kafka partitions be spread across multiple kafka cluster nodes?

My application has a list of kafka cluster nodes specified in the spring.kafka.bootstrap-servers property and listens to topics on all these nodes.

If I were to create a topic on one of these nodes, with lets say 5 partitions, will these partitions be spread across these multiple nodes or will they be created on a single node? Also, how can I find out which node a topic partition actually exists on?

Upvotes: 3

Views: 2944

Answers (3)

Michael Peng
Michael Peng

Reputation: 69

So a Kafka topic is a logic concept, not a physical unit.

Upvotes: 0

si889
si889

Reputation: 31

Like the other answer said, a topic is not owned by or created for a particular node, it is created for the cluster altogether. Whenever a topic is created, the partitions are divided among the cluster nodes. Each partition has a leader node and replica nodes. Producers write to the leader node and Kafka internally replicates the data on the replica nodes. Consumers consume data of a partition from its leader node.

For a better understanding/visualisation of topic partition distribution in Kafka, you can use tools like Kafdrop You can follow the steps in readme section of the repo for setup. You can download the latest binary from here. In the UI, you can see the leader and replica nodes for each partition of a topic.

The setup is pretty straightforward and I personally find the tool VERY useful!

Upvotes: 2

mjuarez
mjuarez

Reputation: 16844

You don't actually create topics in one specific node in a Kakfa cluster. When you issue a request to create a topic, the partitions will automatically be spread out across all nodes belonging to the cluster, and the replicas will also be spread out. That is how Kafka handles high-availability. If one of the nodes is down, some other node has all the required data, so there is no downtime or impact to users of the cluster.

You can issue a --describe command like this:

> bin/kafka-topics.sh --describe --bootstrap-server localhost:9092 --topic my-replicated-topic

    Topic:my-replicated-topic   PartitionCount:1    ReplicationFactor:3 Configs:
        Topic: my-replicated-topic  Partition: 0    Leader: 1   Replicas: 1,2,0 Isr: 1,2,0

That will give you a list of the partitions for your topic, where are they located, which node is the leader for that partition (the one consumers are told to consume from when they need data from that partition), and some more info like the In-Sync Replica status, or ISR, and the replication factor.

There's more info at the official Kafka docs here and here.

Bear in mind that when your client connects to the bootstrap-server it is not specifying a complete list of brokers from which to read data. It's just specifying one (or more) brokers from which to pull information about the cluster. When the client reads/writes from a given topic and partition that is done directly to the relevant broker that holds that data (regardless of the particular brokers specified in the bootstrap). You can see more about this process here and here.

Upvotes: 3

Related Questions