Reputation: 1585
I am trying to start Kafka connect in distributed mode even in standalone also i am not able to proceed
This is my elastic search sink properties
name=elasticsearch-sink
connector.class=io.confluent.connect.elasticsearch.ElasticsearchSinkConnector
tasks.max=5
topics=fsp-audit
key.ignore=true
connection.url=https://****.amazonaws.com
type.name=kafka-connect
errors.tolerance = all
errors.deadletterqueue.topic.name = fsp-dlq-audit-event
this is my connect-distributed.properties
bootstrap.servers=***:9092,***:9092,***:9092
group.id=connect-cluster
key.converter=org.apache.kafka.connect.json.JsonConverter
value.converter=org.apache.kafka.connect.json.JsonConverter
key.converter.schemas.enable=false
value.converter.schemas.enable=false
offset.storage.topic=connect-offsets
offset.storage.replication.factor=1
schema.enabled=false
config.storage.topic=connect-configs
config.storage.replication.factor=1
status.storage.topic=connect-status
status.storage.replication.factor=1
offset.flush.interval.ms=10000
plugin.path=/usr/local/confluent/share/java
I also have created three topi upfront
connect-offsets
connect-configs
connect-status
I am running this on EC2 and using MSK as Kafka . I checked connectivity from my EC2 to MSK and i am able to telent
The error i get this
[2020-01-30 08:53:12,126] INFO [AdminClient clientId=adminclient-1] Metadata update failed (org.apache.kafka.clients.admin.internals.AdminMetadataManager:237)
org.apache.kafka.common.errors.TimeoutException: Timed out waiting to send the call.
[2020-01-30 08:53:12,145] INFO [AdminClient clientId=adminclient-1] Metadata update failed (org.apache.kafka.clients.admin.internals.AdminMetadataManager:237)
org.apache.kafka.common.errors.TimeoutException: Timed out waiting to send the call.
[2020-01-30 08:53:12,149] ERROR Stopping due to error (org.apache.kafka.connect.cli.ConnectDistributed:83)
org.apache.kafka.connect.errors.ConnectException: Failed to connect to and describe Kafka cluster. Check worker's broker connection and security properties.
at org.apache.kafka.connect.util.ConnectUtils.lookupKafkaClusterId(ConnectUtils.java:64)
at org.apache.kafka.connect.util.ConnectUtils.lookupKafkaClusterId(ConnectUtils.java:45)
at org.apache.kafka.connect.cli.ConnectDistributed.startConnect(ConnectDistributed.java:94)
at org.apache.kafka.connect.cli.ConnectDistributed.main(ConnectDistributed.java:77)
Caused by: java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.TimeoutException: Timed out waiting for a node assignment.
at org.apache.kafka.common.internals.KafkaFutureImpl.wrapAndThrow(KafkaFutureImpl.java:45)
at org.apache.kafka.common.internals.KafkaFutureImpl.access$000(KafkaFutureImpl.java:32)
at org.apache.kafka.common.internals.KafkaFutureImpl$SingleWaiter.await(KafkaFutureImpl.java:89)
at org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:260)
at org.apache.kafka.connect.util.ConnectUtils.lookupKafkaClusterId(ConnectUtils.java:58)
Question :If i have to run Kafka connect in distributed mode,do i have to use more than one EC2/vm ?
Upvotes: 0
Views: 1137
Reputation: 1585
Ok after looking into more details i found out that issue was in NACL which was blocking few subnets ip addresses .
So, I checked the MSK side Security group/Network ACL/Route table and found them to be fine. This means that the issue could be with the EC2 instance so I checked the Security group/Route table of the instance and found them to be properly configured.
However, on checking the Network ACL (acl-***)
attached with the EC2 instance, I saw that there is an Inbound rule allowing 0.0.0.0/0 for the ephemeral ports which should allow the brokers to talk to the EC2 instance. However, looking at the Outbound rules, I saw it allowing only the subnet range where b-2 is present but it didn't have any explicit Outbound rule to allow either b-3 (10.**.**.0/24)
or b-4 (10.**.**.0/24)
subnet ranges.
When i added new rule then i was able to ping and connect success fully
Upvotes: 1
Reputation: 191728
If i have to run Kafka connect in distributed mode,do i have to use more than one EC2/vm ?
No, you can run one "psuedo-distributed" instance. The main differerence between standalone and distributed is how it handles storage of offsets and configurations.
Upvotes: 0