Sergey Shcherbakov
Sergey Shcherbakov

Reputation: 4778

How to run a templated Cloud Dataflow job in a Shielded VMs on GCP

According to the public documentation it is possible to run a Cloud Dataflow job in Shielded VMs on GCP.

For a non-templated job, like described in the Quick Start manual for Java that can be achieved by submitting the --dataflowServiceOptions=enable_secure_boot flag as following:

mvn -Pdataflow-runner compile exec:java -Dexec.mainClass=org.apache.beam.examples.WordCount -Djava.util.logging.config.file=logging.properties -Dexec.args="--project=${PROJECT_ID} \
--gcpTempLocation=gs://${BUCKET_NAME}/temp/ \
--output=gs://${BUCKET_NAME}/output \
--runner=DataflowRunner \
--region=${REGION} \
--dataflowServiceOptions=enable_secure_boot"

But when using a templated job, e.g. started using gcloud or Terraform:

gcloud dataflow jobs run word-count --gcs-location gs://dataflow-templates-europe-west3/latest/Word_Count --region ${REGION} --staging-location gs://${BUCKET_NAME}/temp --parameters inputFile=gs://${BUCKET_NAME}/sample.txt,output=gs://${BUCKET_NAME}/sample-output

The VM that gets started is not Shielded (when looking at its "Secure Boot" flag at runtime).

How can I run a templated Dataflow job in a Shielded VM on GCP?

Upvotes: 1

Views: 380

Answers (2)

Martin Beck
Martin Beck

Reputation: 41

Sending update since I've recently ran into this myself. Shielded VMs no longer need to be explicitly mentioned when deploying Dataflow jobs. The Dataflow service as of June 1, 2022 automatically deploys shielded VMs now.

https://cloud.google.com/dataflow/docs/guides/deploying-a-pipeline#dataflow-shielded-vm

Upvotes: 1

Kabilan Mohanraj
Kabilan Mohanraj

Reputation: 1906

gcloud

To deploy the Dataflow job on shielded VMs, the additional-experiments flag has to be set to enable_secure_boot. I tested this out and was able to see that the secure boot was on during the job runtime.

gcloud dataflow jobs run word-count-on-shielded-vm-from-gcloud --project=project-id \
--gcs-location gs://dataflow-templates-europe-west3/latest/Word_Count \
--region us-central1 --staging-location gs://bucket-name/temp \
--parameters inputFile=gs://apache-beam-samples/shakespeare/kinglear.txt,output=gs://bucket-name/sample-output \
--additional-experiments=enable_secure_boot

Terraform

By adding the additional_experiments argument with enable_secure_boot to the google_dataflow_job resource, the Dataflow job can be deployed on shielded VMs.

resource "google_dataflow_job" "word_count_job" {
  name              = "sample-dataflow-wordcount-job"
  template_gcs_path = "gs://dataflow-templates-europe-west3/latest/Word_Count"
  temp_gcs_location = "${google_storage_bucket.bucket.url}/temp"
  parameters = {
    inputFile = "${google_storage_bucket.bucket.url}/input_file.txt",
    output = "${google_storage_bucket.bucket.url}/word_count.txt"
  }
  additional_experiments = [
    "enable_secure_boot"
  ]
}

Upvotes: 1

Related Questions