hster
hster

Reputation: 41

How do Kafka Connect workers allocate manage resource limits (memory/cores) to distribute tasks?

In Kubernetes, you explicitly specify the resource limits for a container. In launching a Kafka connector, you request max tasks but how does the connect worker cluster know how to distribute the load? Does it consider the tasks as equal? Does it use internal metrics?

The Apache Kafka docs and the confluent docs do not explicitly say except Confluent advises the following which would indicate connect workers do not have resource management:

The resource limit depends heavily on the types of connectors being run by the workers, but in most cases users should be aware of CPU and memory bounds when running workers concurrently on a single machine.

https://docs.confluent.io/3.1.2/connect/userguide.html#connect-standalone-v-distributed

Also the cluster deployment appears to require an external resource manager to handle failover of workers.

Kafka Connect workers can be deployed in a number of ways, each with their own benefits. Workers lend themselves well to being run in containers in managed environments such as YARN, Mesos, or Docker Swarm as all state is stored in Kafka, making the local processes themselves stateless. We provide Docker images and documentation for getting started with those images is here. By design, Kafka Connect does not automatically handle restarting or scaling workers which means your existing clustering solutions can continue to be used transparently.

Upvotes: 1

Views: 1215

Answers (1)

Robin Moffatt
Robin Moffatt

Reputation: 32110

how does the connect worker cluster know how to distribute the load

Each connector can opt to partition its work into tasks (for example, ingesting multiple tables from one database could be done in parallel and so one table would be done by one task), up to the tasks.max limit configured.

Kafka Connect balances these tasks across the available workers such that they are evenly distributed (based on the number of tasks).

The rebalancing protocol changed in release 2.3 of Apache Kafka as part of KIP-415, there are details in the KIP and here. In a nutshell, with incremental cooperative rebalancing Kafka Connect spreads the tasks equally starting from the least loaded workers, eventually including more workers while the load evens out.

Also the cluster deployment appears to require an external resource manager to handle failover of workers.

To be clear - the failover of tasks is done automatically by Kafka Connect, and as you say, the failover of workers would be managed externally.

Upvotes: 1

Related Questions