Reputation: 21
Kafka Connect source and sink connectors provide practically ideal feature set for configuring a data pipeline without writing any code. In my case I wanted to use it for integrating data from several DB servers (producers) located on the public Internet.
However some producers don't have direct access to Kafka brokers as their network/firewall configuration allows traffic to a specific host only (port 443). And unfortunately I cannot really change these settings.
My thought was to use Confluent REST Proxy but I learned that Kafka Connect uses KafkaProducer API so it needs direct access to brokers.
I found a couple possible workarounds but none is perfect:
Has anyone faced similar challenge? How did you solve it?
Upvotes: 1
Views: 2556
Reputation: 21
As @OneCricketeer recommended, I tried a HTTP Sink Connector with REST Proxy approach. I managed to configure Confluent HTTP Sink connector as well as alternative one (github.com/llofberg/kafka-connect-rest) to work with Confluent REST Proxy.
I'm adding connector configuration in case it saves some time to anyone trying this approach.
Confluent HTTP Sink connector
{
"name": "connector-sink-rest",
"config": {
"topics": "test",
"tasks.max": "1",
"connector.class": "io.confluent.connect.http.HttpSinkConnector",
"headers": "Content-Type:application/vnd.kafka.json.v2+json",
"http.api.url": "http://rest:8082/topics/test",
"key.converter": "org.apache.kafka.connect.storage.StringConverter",
"value.converter": "org.apache.kafka.connect.storage.StringConverter",
"value.converter.schemas.enable": "false",
"batch.prefix": "{\"records\":[",
"batch.suffix": "]}",
"batch.max.size": "1",
"regex.patterns":"^~$",
"regex.replacements":"{\"value\":~}",
"regex.separator":"~",
"confluent.topic.bootstrap.servers": "localhost:9092",
"confluent.topic.replication.factor": "1"
}
}
Kafka Connect REST connector
{
"name": "connector-sink-rest-v2",
"config": {
"connector.class": "com.tm.kafka.connect.rest.RestSinkConnector",
"tasks.max": "1",
"topics": "test",
"rest.sink.url": "http://rest:8082/topics/test",
"rest.sink.method": "POST",
"rest.sink.headers": "Content-Type:application/vnd.kafka.json.v2+json",
"transforms": "velocityEval",
"transforms.velocityEval.type": "org.apache.kafka.connect.transforms.VelocityEval$Value",
"transforms.velocityEval.template": "{\"records\":[{\"value\":$value}]}",
"transforms.velocityEval.context": "{}"
}
}
Upvotes: 1
Reputation: 192013
Sink Connectors (ones that write to external systems) do not use the Producer API.
That being said, you could use some HTTP Sink Connector that issues POST requests to the REST Proxy endpoint. It's not ideal, but it would address the problem. Note: This means you have two clusters - one that you are consuming from in order to issue HTTP requests via Connect, and the other behind the proxy.
Overall, I don't see how the question is unique to Connect, since you'd have similar issues with any other attempt to write the data to Kafka through the only open HTTPS port.
Upvotes: 1