Reputation: 453
Hello i'm trying to run a job to read some events from a kafka hosted outside GCP using dataflow, the job is ran on a VPC network.
Problem is the kafka is configured to answer with hostname instead of IPs, therefore specifying ips in bootstrap results in a failure to connect to the target node when running a job on dataflow.
Reader-1: Timeout while initializing partition 'placeholder'. Kafka client may not be able to connect to servers.
On the other hand if i create a VM with kafka and specify in etc/host
the mapping hostname-ip i'm able to consume correctly.
To make it work on Dataflow i tried to create a private Cloud DNS with DNS name = .
this allows me to create the zone where i can map hosts in each entries DNS name = nodename1.
to data = IP1
.
This seems to work as i'm able to telnet nodename1
on a VM where i didn't specify the mapping in etc/host
.
However the Job gets stucked in the beginning and the only error i get is:
"Timeout in polling result file: gs://placeholder/staging,zone=europe-central2-a/template_launches/2021-09-28_01_43_38-12780367774525067695/operation_result.
Possible causes are:
1. Your launch takes too long time to finish. Please check the logs on stackdriver.
2. Service account placeholder@placeholder may not have enough permissions to pull container image gcr.io/dataflow-templates/2021-09-20-00_rc00/kafka-to-bigquery or create new objects in gs://placeholder.
3. Transient errors occurred, please try again."
Is there an easy way to map hosts to ip for a job in dataflow?
Upvotes: 0
Views: 365
Reputation: 453
I figured this one out.
The correct way was not to create a zone with DNS name=.
, since this would mean that to resolve every domain GCP would look into this zone, not finding anything else other than the name-ip
entries i created. Since Dataflow workers query 169.254.169.254 internally, this was forwarded to the zone i created and could not be resolved, resulting in the job hanging.
The correct way was to create a zone for each kafka "nodename" with the DNS name = nodename.
and then map an entry nodename.-ip
.
By repeating this for every nodename bootstrapped that dataflow was not able to resolve automatically i was able to correctly consume from the kafka topic.
Upvotes: 1