user_default
user_default

Reputation: 416

Number of Partitions vs Producer Throughput in Apache Kafka

Does number of partitions have an impact on producer throughput in Kafka? ( I understand that number of partitions is the upper bound for degree of parallelism on consumer side, but does it affect the producer performance ? )

I used the producer performance tool in Kafka to test this on a Kafka cluster setup on AWS. I observed that for 3 , 6 and 20 partitions the aggregated throughput in the cluster was approximately similar ( around 200 MB/s ). I would appreciate if you could help me clarify this issue.

Thank you.

Upvotes: 5

Views: 2193

Answers (1)

Paul Brebner
Paul Brebner

Reputation: 29

an answer in two parts:

  1. From the Kafka consumer perspective. Yes, partitions give improved throughput for Kafka consumers. But, I found that you really want to minimise the number of Kafka consumers (and therefore partitions) if you want good scalability. Here's a link to a blog I wrote last year on a Kafka IoT application (see section 2.3)
  2. From the Kafka producer perspective, throughput drops with more partitions. Last week I ran some benchmarks with Kafka producers and different numbers of partitions and found that the throughput drops off significantly with more partitions. To "size" a Kafka cluster correctly, the only solution is to increase the Kafka cluster size (nodes and/or cores) until you get the target capacity with the required number of partitions. I needed 2M write/s and 200 partitions (for concurrency on the consumer side). For a 6 node (4 cores per node) cluster I could do 2.1M write/s with 6 partitions, but only 1.2M write/s with 200 partitions. On a 6 node cluster with 8 core nodes I could get 4.6M write/s with 6 partitions, and slightly more than my target throughput of 2.4M write/s with 200 partitions. I haven't blogged about these results yet but here's a link to the current blog series (Anomalia Machina).

Note: Throughput can also be increased by (a) reducing the replication factor or (b) by only writing to a subset of partitions (!) but then you probably don't need all the partitions.

Upvotes: 1

Related Questions