Xiang Zhang
Xiang Zhang

Reputation: 2973

How to set kafka-connect connector and task's JVM heap size?

Does kafka connect start new connector and its tasks within kafka connect process? or a new JVM process will be forked.

If it starts plugin within kafka connect process, then I need set kafka connect JVM heap size via KAFKA_CONNECT_JVM_HEAP_OPT (using confluent docker image). Then the problem is, if I start many tasks or many connectors, they will share the JVM heap, so it is hard to decide the heap size of kafka connect.

If for each connector, kafka connect starts them in a new JVM process, how can I set the heap size for them?

Upvotes: 7

Views: 9736

Answers (2)

OneCricketeer
OneCricketeer

Reputation: 191874

All tasks share the memory space within one worker's host OS, whether that's a container doesn't really matter (other than the fact without JVM flags on the process inside container, it's limited even further)

You "add memory" to your Connect cluster by adding more workers. You prevent OOM errors by increasing topic partitions, Connector tasks, reducing poll/batch amounts, and reducing the overall amount of data each worker needs to read.

The environment variable for Connect's heap settings is KAFKA_HEAP_OPTS, and you can add more JVM flags from KAFKA_OPTS

Upvotes: 4

Konstantine Karantasis
Konstantine Karantasis

Reputation: 1993

Kafka Connect has basic support for multi-tenancy. Specifically, you are able to bundle several connector instances within the same Connect worker.

Each Connect worker always maps to a single JVM instance. A request to start a new connector does not result into spawning a new JVM instance. But Connect workers with the same group.id form a cluster of Connect workers. Then, connector tasks are distributed among the workers in the Connect cluster.

A Connect worker's heap size can be easily set using:

export KAFKA_HEAP_OPTS="-Xms256M -Xmx2G" (this example uses the default values)

or, when a docker image is used, by setting:

-e CONNECT_KAFKA_HEAP_OPTS="-Xms256M -Xmx2G" (again this example uses the default values)

Connect workers can be scaled horizontally. Adding more workers in a Connect cluster adds memory and computing resources to your deployment. If you need to apply a more specific and tight memory budget to your Connect deployment, you might chose to group specific connectors to each Connect cluster, or even in some cases deploy one connector instance per Connect cluster.

Upvotes: 7

Related Questions