DataJanitor
DataJanitor

Reputation: 486

kafka-connect - s3-connector - JVM heap - estimated heap size calculation

im trying to productionalize kafka connect in our environment. For infra requirements purposes, I'm looking for how to estimate the required JVM heap size per node. I have two topics that I would like to sink to s3 with s3 connector. I dont see any good articles to arrive at the estimates. can someone please guide me?

Upvotes: 2

Views: 392

Answers (1)

OneCricketeer
OneCricketeer

Reputation: 191874

There is no good guide because the connector is too configurable.

For example, each task (max.tasks) will batch records up to the flush size (flush.size), then dump it to storage.

If you are using the DefaultPartitoner, you could estimate how many records you're storing per partition, then how many tasks will be running per node, and then how many total topics you're consuming, and come up with a rough number.

If you're using the TimeBasedPartitioner, then you'll need to take into account the partition duration and scheduled rotate interval. I can say that 8GB RAM is capable of writing multiple GB files from few partitions on an hourly partition, so I don't think you need much more heap than that to start.

As far as other documentation, there's a decent description in this issue https://github.com/confluentinc/kafka-connect-storage-cloud/issues/177

Upvotes: 1

Related Questions