Reputation: 12853
I'm running a Kafka Streams application with three sub-topologies. The stages of activity are roughly as follows:
stream
Topic AselectKey
and repartition Topic A to
Topic Bstream
Topic Bforeach
Topic B to Topic C Producer
stream
Topic Cto
Topic DTopics A, B, and C are each materialized, which means that if each topic has 40 partitions, my maximum parallelism is 120.
At first I was running 5 streams applications with 8 threads a piece. With this set up I was experiencing inconsistent performance. It seems like some sub-topologies sharing the same thread were hungrier for CPU than others and after a while, I'd get this error: Member [client_id] in group [consumer_group] has failed, removing it from the group (kafka.coordinator.group.GroupCoordinator)
. Everything would get rebalanced, which could lead to decreased performance until the next failure and rebalance.
My questions are as follows:
Upvotes: 0
Views: 2312
Reputation: 62350
The thread will poll() for all topics of different sub-topologies and check the records topic
metadata to feed it into the correct task.
Each sub-topology is treated the same, ie, available resources are evenly distributed if you wish.
A 1:1 ratio is only useful if you have enough cores. I would recommend to monitor your CPU utilization. If it's too high (larger >80%) you should add more cores/threads.
Kafka Streams handles this for you automatically.
Couple of general comments:
max.poll.interval.ms
config to avoid that a consumer drops out of the groupmax.poll.records
to get less records per poll()
call, and thus decrease the time between two consecutive calls to poll()
.max.poll.records
does not imply increases network/broker communication -- if a single fetch request return more records than max.poll.records
config, the data is just buffered within the consumer and the next poll()
will be served from the buffered data avoiding a broker round tripUpvotes: 2