Reputation: 47
I've recently written an application that connects to running Kafka instance and creates multiple topics on-demand via rest endpoint in the loop. I'm logging every 'create topic' call, and it tends to be extremely fast (like 100 ms to delegate creation of 10k topics). Then processing on Kafka's side starts, lasts for several dozen seconds, and suddenly stops without any error. Listing data directory shows that Kafka created like 2.5k directories, while the delegation was for 10k. The following endpoint call also makes a similar number of topics.
An increasing number of Kafka instances doesn't change results (also, switching to Kafka without a zookeeper gives the same results). What am I doing wrong? Is that an OS limitation with creating directories (syslog empty)?
Yeah, I know that Kafka is not created for handling many topics, but as far as I know, it should handle at least 100k~ (and more than a few million using zookeeper-less KRaft).
My setup:
version: '3.5'
services:
zookeeper:
image: wurstmeister/zookeeper
ports:
- "2181:2181"
kafka-1:
image: wurstmeister/kafka
ports:
- "9092:9092"
environment:
DOCKER_API_VERSION: 1.22
KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
I'm doing this kinda stupid:
for (int i = 0; i < 10_000; i++) {
adminClient.createTopics(List.of(new NewTopic(UUID.randomUUID().toString(), 1, (short) 1)));
}
When I make a collection first, and then delegate creation, it succeeds, but still - what if I would do it record by record, endpoint by endpoint?
Upvotes: 0
Views: 274
Reputation: 56
Well, there are limitations depending on your setup-- a number of brokers/zookeepers, configuration, hardware, and operating system. For OS limitation, see https://kafka.apache.org/documentation/#os.
You could see from this apache-kafka-supports-200k-partitions-per-cluster blog post how they set up a cluster to support that 200k topic partitions.
At that time(Kafka 1.1.0), here is what they recommended
we recommend each broker to have up to 4,000 partitions and each cluster to have up to 200,000 partitions.
But for Kafka 2.8.0, from the the Kafka The Definitive Guide 2nd Edition
Currently, in a well-configured environment, it is recommended to not have more than 14,000 partitions per broker and 1 million replicas per cluster.
Though, from your comment
I'm logging every 'create topic' call, and it tends to be extremely fast (like 100 ms to delegate creation of 10k topics).
I don't think that Kafka actually creates 10k topics within 100ms. From what I found from my experiment(of course, I'm trying to set up a cluster to handle more than 100k partitions), I create my own producer client in C++ using librdkafka. The producer is asynchronous based. I could easily submit a message to not yet existing 10K topics to enforce topic creation. But It would take some time to actually get a successful ACK from the broker. And the more online partitions in the cluster, the more time you may have to wait for the successful ACK.
I would recommend you equip your Kafka cluster with a monitoring tool so you could see the health of your setup in real time (if your cluster is not too busy).
Upvotes: 1