Reputation: 126
I have some problems with my kafka stream throughput. I try to read a topic with +90M of records. With my kafka stream app, which basically only does a print of each record, I get a throughput of ~4K records/second. However, if I consume the exact same topic with a basic kafka-avro-console-consumer command line, I am getting a throughput of ~80K records/second! Are there some known limitations that would explain why a stream app should be less performant than the Underpinning of the kafka-avro-console-consumer? Any guidance on which stream config I should tweak to achieve a better performance?
my config is:
Properties configs = new Properties();
configs.put(CommonClientConfigs.BOOTSTRAP_SERVERS_CONFIG, kafkaConfig.getBootstrapServer());
configs.put(AbstractKafkaAvroSerDeConfig.SCHEMA_REGISTRY_URL_CONFIG,
kafkaConfig.getSchemaRegistryServer());
configs.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, SpecificAvroSerde.class);
configs.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, SpecificAvroSerde.class);
configs.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, EARLIEST);
configs.put(StreamsConfig.APPLICATION_ID_CONFIG, "KS-test3");
and the topology symply does:
StreamsBuilder streamsBuilder = new StreamsBuilder();
streamsBuilder.stream(scheduleEventTopic)
.foreach(this::printRecord);
return streamsBuilder.build();
Upvotes: 1
Views: 895
Reputation: 126
I actually found my problem. The commit.interval
was set at 0 to disable batching in my aggregate. Instead, I have used the cache.max.bytes.buffering
to get the same effect without affecting the performance. My throughput went from 4K tps to 100k tps
Upvotes: 1
Reputation: 9347
Try increasing the value of max.poll.records
to something higher. This configuration means that the number of records you can get in one poll()
max.poll.records (1000 default)
You may also want to look at max.poll.interval.ms
the time between each poll and try reducing it and see.
Also, you may want to increase the number of stream threads and set it to no. of partitions of the topic you are consuming.
num.stream.threads (1 default)
Reference: https://docs.confluent.io/current/streams/developer-guide/config-streams.html
P.S: The default values are from the above reference, yours may vary.
Upvotes: 1