Reputation: 13835
I am learning Kafka and trying to create a topic for my recent search application. The data being pushed to kafka topics is assumed be a high number.
My kafka cluster have 3 brokers and there are already topics created for other requirements.
Now what should be the number of partitions which i should choose for my recent search topic? And what if i do not provide the partition number explicitly? What are things needs to be considered when choosing the partition number?
Upvotes: 8
Views: 12428
Reputation: 715
In general More partitions lead to higher throughput. But few things to consider - more partitions means consumer has to have higher memory system to consume data.
Interesting articles I found about it
Upvotes: 1
Reputation: 1727
I would consider evaluating two main things before deciding on the no of partitions.
First point is, how the partitions, consumers of a consumer group act together. In simple words, One consumer can consume messages from more than one partitions but one partition can't be consumed by more than one consumer. That means, it makes sense to have no.of partitions >= no.of consumers in a consumer group. Otherwise you will end up having consumers without any partition is being assigned.
Second point is, what's your requirement from latency vs throughout point of view. In simple words, Latency is the time required to perform some action or to produce some result. Latency is measured in units of time -- hours, minutes, seconds, nanoseconds or clock periods. Throughput is the number of such actions executed or results produced per unit of time
Now, coming back to the comparison from kafka stand point, In general, more partitions in a Kafka cluster leads to higher throughput. But, you should be careful with this number if you are really looking for low latency.
Upvotes: 4
Reputation: 3312
This will depend on the throughput of your consumers. If you are producing 100 messages a second and your consumers can process 10 messages a second then you'll want at least 10 partitions (produce / consume) with 10 instances of your consumer. If you want this topic to be able to handle future growth, then you'll want to increase the partition count even higher so that you can add more instances of your consumer to handle the new volume.
Another piece of advice would be to make your partition count a highly divisible number so that you can scale up/down consumers while keeping their load balanced. For example, if you choose 10 partitions then you would have to have 1, 2, 5, or 10 instances of your consumer to keep them each processing from the same number of partitions. If you choose 12 partitions instead then you could be balanced with either 1, 2, 3, 4, 6, or 12 instances of your consumer.
Upvotes: 18