GMA
GMA

Reputation: 6096

Can Kafka Connect be made rack aware so that my connector reads all partitions from one broker?

We have a Kafka cluster in Amazon MSK that has 3 brokers in different availability zones of the same region. I want to set up a Kafka Connect connector that backs up all data from our Kafka brokers to Amazon S3, and I'm trying to do it with MSK Connect.

I set up Confluent's S3 Sink Connector on MSK Connect and it works - everything is uploaded to S3 as expected. The problem is that it costs a fortune in data transfer charges - our AWS bills for MSK nearly double whenever the connector is running, with EU-DataTransfer-Regional-Bytes accounting for the entire increase.

It seems that the connector is pulling messages from all three of our brokers, i.e. from three different AZs, and so we're getting billed for inter-AZ data transfer. This makes sense because by default it will read a partition from that partition's leader, which could be any of the three brokers. But if we were creating a normal consumer, not a connector, it would be possible to restrict the consumer to read all partitions from a specific broker:

"client.rack" : "euw1-az3"

☝️ For a consumer in the euw1-az3 AZ, this setting makes the consumer read all partitions from the local broker, regardless of the partitions' leader - which avoids the need for inter-AZ data transfer and brings the bills down massively.

My question is, is it possible to do something similar for a Kafka Connector? What config setting do I have to pass to the connector, or the worker, to make it only read from one specific broker/AZ? Is this possible with MSK Connect?

Upvotes: 2

Views: 1190

Answers (2)

GMA
GMA

Reputation: 6096

AWS confirmed to me on a call with support that MSK Connect doesn't currently support rack awareness. I was able to solve my problem by deploying the connector in an EC2 instance (not on MSK Connect) with the connect worker config consumer.client.rack set to the same availability zone that the EC2 instance is running in.

Upvotes: 0

Raul Lapeira Herrero
Raul Lapeira Herrero

Reputation: 987

Maybe I am missing something about your question. I think you want to have a look at this:

https://docs.confluent.io/platform/current/tutorials/examples/multiregion/docs/multiregion.html

replica.selector.class=org.apache.kafka.common.replica.RackAwareReplicaSelector

I though it was general knowledge, it applies to any on-premises or cloud deployment.

Upvotes: 0

Related Questions