Sudarshan kumar
Sudarshan kumar

Reputation: 1585

Not able to understand Kafka Connect in distributed mode

I am started Kafka connect in stand alone mode like below

/usr/local/confluent/bin/connect-standalone /usr/local/confluent/etc/kafka/connect-standalone.properties /usr/local/confluent/etc/kafka-connect-elasticsearch/quickstart-elasticsearch.properties

After that i have created a connector with all details using rest API . Like this

curl  -X POST -H "Content-Type: application/json" --data '{"name":"elastic-search-sink-audit","config":{"connector.class":"io.confluent.connect.elasticsearch.ElasticsearchSinkConnector","tasks.max":"5","topics":"fsp-AUDIT_EVENT_DEMO","key.ignore":"true","connection.url":"https://**.amazonaws.com","type.name":"kafka-connect-distributed","name":"elastic-search-sink-audit","errors.tolerance":"all","errors.deadletterqueue.topic.name":"fsp-dlq-audit-event"}}' http://localhost:8083/connectors | jq

After that when i check status i can see 5 task running

curl  localhost:8083/connectors/elastic-search-sink-audit/tasks | jq

Question 1:

Does this mean i am running my kafka connect connecter in distributed mode or in Standalone mode only ?

Question 2:

Do i have to modify connect-distributed.properties file and start like stand alone ?

Question 3:

Currently i am running all my set up in only one EC2 ,Now if i have to add 5 more EC2 to make connector more parallel and speed up how can i make this How will kafka connect understand 5 more EC2 has been added and it has to share workload ?

Question 4: Do i have to run and start and create kafka connect in all ec2 and just start ? How can i confirm that all 5 EC2 are running properly with same connector .

Last i have given try starting connector in distributed mode . First i started like this

/usr/local/confluent/bin/connect-distributed /usr/local/confluent/etc/kafka/connect-distributed.properties /usr/local/confluent/etc/kafka-connect-elasticsearch/quickstart-elasticsearch.properties

and then in another session using rest API i submitted like this

curl  -X POST -H "Content-Type: application/json" --data '{"name":"elastic-search-sink-audit","config":{"connector.class":"io.confluent.connect.elasticsearch.ElasticsearchSinkConnector","tasks.max":"5","topics":"fsp-AUDIT_EVENT_DEMO","key.ignore":"true","connection.url":"https://**.amazonaws.com","type.name":"kafka-connect-distributed","name":"elastic-search-sink-audit","errors.tolerance":"all","errors.deadletterqueue.topic.name":"fsp-dlq-audit-event"}}' http://localhost:8083/connectors | jq

But as soon as hit this i started getting error like this

rror: NOT_ENOUGH_REPLICAS (org.apache.kafka.clients.producer.internals.Sender:598)
[2020-02-01 13:48:15,551] WARN [Producer clientId=producer-3] Got error produce response with correlation id 159 on topic-partition connect-configs-0, retrying (2147483496 attempts left). Error: NOT_ENOUGH_REPLICAS (org.apache.kafka.clients.producer.internals.Sender:598)
[2020-02-01 13:48:15,652] WARN [Producer clientId=producer-3] Got error produce response with correlation id 160 on topic-partition connect-configs-0, retrying (2147483495 attempts left). Error: NOT_ENOUGH_REPLICAS (org.apache.kafka.clients.producer.internals.Sender:598)
[2020-02-01 13:48:15,753] WARN [Producer clientId=producer-3] Got error produce response with correlation id 161 on topic-partition connect-configs-0, retrying (2147483494 attempts left). Error: NOT_ENOUGH_REPLICAS (org.apache.kafka.clients.producer.internals.Sender:598)
[2020-02-01 13:48:15,854] WARN [Producer clientId=producer-3] Got error produce response with correlation id 162 on topic-partition connect-configs-0, retrying (2147483493 attempts left). Error: NOT_ENOUGH_REPLICAS (org.apache.kafka.clients.producer.internals.Sender:598)
[2020-02-01 13:48:15,956] WARN [Producer clientId=producer-3] Got error produce response with correlation id 163 on topic-partition connect-configs-0, retrying (2147483492 attempts left). Error: NOT_ENOUGH_REPLICAS (org.apache.kafka.clients.producer.internals.Sender:598)

Finally request time out when i try to create connector using curl

{ "error_code": 500, "message": "Request timed out" }

Please help me understand this .

Upvotes: 1

Views: 1571

Answers (1)

OneCricketeer
OneCricketeer

Reputation: 192023

Both modes start a REST API

Distributed mode does not accept a properties file for the connectors, you must POST it. There is no reason to do that in standalone as the connectors that you provide from the command line are already running

Distributed mode is recommended because the state of the connectors is stored back into a Kafka topic rather than maintained in files on the single machine running standalone mode

For more details, please refer - Kafka Connect Concepts

How will kafka connect understand 5 more EC2 has been added and it has to share workload ?

Do i have to run and start and create kafka connect in all ec2 and just start ? How can i confirm that all 5 EC2 are running properly with same connector.

Well, your EC2 machines don't know to start any process unless they're part of some distributed cluster, so you must start distributed mode on each, using the same settings (Confluent's Ansible repo makes this really easy).

You can hit the /status endpoint of any Connect server to see which addresses are running which tasks

NOT_ENOUGH_REPLICAS

Because you don't have enough brokers to create the internal Kafka Connect topics for tracking state

Upvotes: 3

Related Questions