Sachin Shetty
Sachin Shetty

Reputation: 49

How to disable public ip in a predefined template for a dataflow job launch

I am trying to deploy a dataflow job using google's predefined template using python api

I do not want my dataflow compute instance to have a public ip, so I use something like this:

GCSPATH="gs://dataflow-templates/latest/Cloud_PubSub_to_GCS_Text"
BODY = {
    "jobName": "{jobname}".format(jobname=JOBNAME),
    "parameters": {
        "inputTopic" : "projects/{project}/topics/{topic}".format(project=PROJECT, topic=TOPIC),
        "outputDirectory": "gs://{bucket}/pubsub-backup-v2/{topic}/".format(bucket=BUCKET, topic=TOPIC),
        "outputFilenamePrefix": "{topic}-".format(topic=TOPIC),
        "outputFilenameSuffix": ".txt"
     },
     "environment": {
        "machineType": "n1-standard-1",
        "usePublicIps": False,
        "subnetwork": SUBNETWORK,
     }
}

request = service.projects().templates().launch(projectId=PROJECT, gcsPath=GCSPATH, body=BODY)
response = request.execute()

but I get this error:

raise HttpError(resp, content, uri=self.uri)
googleapiclient.errors.HttpError: <HttpError 400 when requesting https://dataflow.googleapis.com/v1b3/projects/ABC/templates:launch?alt=json&gcsPath=gs%3A%2F%2Fdataflow-templates%2Flatest%2FCloud_PubSub_to_GCS_Text returned "Invalid JSON payload received. Unknown name "use_public_ips" at 'launch_parameters.environment': Cannot find field.">

If I remove the usePublicIps, it goes through, but my compute instance gets deployed with public ip.

Upvotes: 0

Views: 4941

Answers (5)

p13rr0m
p13rr0m

Reputation: 1297

Besides all the other methods mentioned so far, gcloud dataflow jobs run and gcloud dataflow flex-template run define the optional flag --disable-public-ips.

Upvotes: 0

user2179539
user2179539

Reputation: 161

It seems you are using the json from projects.locations.templates.create The environment block documented here needs to follow

"environment": {
    "machineType": "n1-standard-1",
    "ipConfiguration": "WORKER_IP_PRIVATE",
    "subnetwork": SUBNETWORK // sample: regions/${REGION}/subnetworks/${SUBNET}
}

The value for ipConfiguration is an enum documented at Job.WorkerIPAddressConfiguration

Upvotes: 4

Ian Raphael
Ian Raphael

Reputation: 76

The parameter usePublicIps cannot be overriden in runtime. You need to send this parameter with value false into Dataflow Template generation command.

mvn compile exec:java -Dexec.mainClass=class -Dexec.args="--project=$PROJECT \
--runner=DataflowRunner --stagingLocation=bucket --templateLocation=bucket \
--usePublicIps=false"

It will add an entry ipConfiguration on template's JSON indicating that workers needs only with Private IP.

The links are printscreens of template JSON with and without ipConfiguration entry.

Template with usePublicIps=false

Template without usePublicIps=false

Upvotes: 3

Sachin Shetty
Sachin Shetty

Reputation: 49

I found one way to make this work

  1. Clone Google Defined Templates

  2. Run the template with custom parameters

mvn compile exec:java \
 -Dexec.mainClass=com.google.cloud.teleport.templates.PubsubToText \
 -Dexec.cleanupDaemonThreads=false \
 -Dexec.args=" \
 --project=${PROJECT_ID} \
 --stagingLocation=gs://${BUCKET}/dataflow/pipelines/${PIPELINE_FOLDER}/staging \
 --tempLocation=gs://${BUCKET}/dataflow/pipelines/${PIPELINE_FOLDER}/temp \
 --runner=DataflowRunner \
 --windowDuration=2m \
 --numShards=1 \
 --inputTopic=projects/${PROJECT_ID}/topics/$TOPIC \
 --outputDirectory=gs://${BUCKET}/temp/ \
 --outputFilenamePrefix=windowed-file \
 --outputFilenameSuffix=.txt \
 --workerMachineType=n1-standard-1 \
 --subnetwork=${SUBNET} \
 --usePublicIps=false"

Upvotes: 0

Nahuel Varela
Nahuel Varela

Reputation: 1040

By reading the docs for Specifying your Network and Subnetwork on Dataflow I see that python uses use_public_ips=false insted of usePublicIps=false which is used by Java. Try changing that parameter.

Also, keep in mind that:

When you turn off public IP addresses, the Cloud Dataflow pipeline can access resources only in the following places:

  • another instance in the same VPC network

  • a Shared VPC network

  • a network with VPC Network Peering enabled

Upvotes: 1

Related Questions