Reputation: 980
I have a connector that consume 2 topics
1 topic with 6 partitions and the second topic with 2 partitions ( 8 partitions in total to consume)
When I set tasks.max under 6 the partitions to consume are well balanced between the tasks (by looking the consumer_group)
But If I set tasks.max to upper than 6 , by example 8 , then I have two task without topics attribution in the consumer_group ( all the 8 tasks are in RUNNING state ), so there is 2 idle tasks
the topic offset.storage.topic only have 6 partitions
It is impossible for a connector to have more active (not only running) task than the number of partitions of the topic offset.storage.topic ?
So is the value offset.storage.partitions related to max active connector task ?
offset.storage.topic :
topic with a large number of partitions (e.g., 25 or 50, just like Kafka’s built-in __consumer_offsets topic) to support large Kafka Connect clusters.
Upvotes: 1
Views: 2025
Reputation: 980
In kafka-connect 2.7
The maximum number of active task for a sink connector is equal to the topic with the biggest number of partitions to consume
because it's using the RangeAssignor partition assignment strategy
(it's not related to the number of partitions of the topic offset.storage.partitions)
An active task is a task with partition attributed in the consumer-group of the sink connector
example :
With 2 topics where each have 10 partitions
The maximum number of active task is 10 ( if I set task.max at 12 , 2 task in the consumer-group do not have partitions to consume).
If I add a third topic with 15 partitions to the connector conf then the 12 task receive partitions to consume , and then if I set now task.max at 17 only 15 task are active in the consumer-group.
Only way I found to force an equal distribution of the partitions between all the members of the consumer-group is to set
"consumer.override.partition.assignment.strategy": "org.apache.kafka.clients.consumer.RoundRobinAssignor"
Upvotes: 1
Reputation: 1831
tasks.max
configuration comes from the framework and specifies the maximum number of tasks to be created for the connector. But, fewer tasks may be created. - https://docs.confluent.io/current/connect/managing/configuring.html
In this case, the framework decides that it only needs # tasks to handle the load, so each task handles related topic-partitions. Which, is fine from the framework perspective since each partition's in-order guarantee is not violated.
If you know more about your load pattern and want to explicitly have 1 unique task for each topic-partition, you can try to break up the config into separate configs, one for each topic and each having tasks.max of related number of partitions
offset.storage.partitions
related to max active connector task ?
-- No, they are not related. You should have it set 25 (default) or more... it is better not touching it...
Upvotes: 1