Reputation: 4778
According to the public documentation it is possible to run a Cloud Dataflow job in Shielded VMs on GCP.
For a non-templated job, like described in the Quick Start manual for Java that can be achieved by submitting the --dataflowServiceOptions=enable_secure_boot
flag as following:
mvn -Pdataflow-runner compile exec:java -Dexec.mainClass=org.apache.beam.examples.WordCount -Djava.util.logging.config.file=logging.properties -Dexec.args="--project=${PROJECT_ID} \
--gcpTempLocation=gs://${BUCKET_NAME}/temp/ \
--output=gs://${BUCKET_NAME}/output \
--runner=DataflowRunner \
--region=${REGION} \
--dataflowServiceOptions=enable_secure_boot"
But when using a templated job, e.g. started using gcloud or Terraform:
gcloud dataflow jobs run word-count --gcs-location gs://dataflow-templates-europe-west3/latest/Word_Count --region ${REGION} --staging-location gs://${BUCKET_NAME}/temp --parameters inputFile=gs://${BUCKET_NAME}/sample.txt,output=gs://${BUCKET_NAME}/sample-output
The VM that gets started is not Shielded (when looking at its "Secure Boot" flag at runtime).
How can I run a templated Dataflow job in a Shielded VM on GCP?
Upvotes: 1
Views: 380
Reputation: 41
Sending update since I've recently ran into this myself. Shielded VMs no longer need to be explicitly mentioned when deploying Dataflow jobs. The Dataflow service as of June 1, 2022 automatically deploys shielded VMs now.
https://cloud.google.com/dataflow/docs/guides/deploying-a-pipeline#dataflow-shielded-vm
Upvotes: 1
Reputation: 1906
To deploy the Dataflow job on shielded VMs, the additional-experiments
flag has to be set to enable_secure_boot
. I tested this out and was able to see that the secure boot was on during the job runtime.
gcloud dataflow jobs run word-count-on-shielded-vm-from-gcloud --project=project-id \
--gcs-location gs://dataflow-templates-europe-west3/latest/Word_Count \
--region us-central1 --staging-location gs://bucket-name/temp \
--parameters inputFile=gs://apache-beam-samples/shakespeare/kinglear.txt,output=gs://bucket-name/sample-output \
--additional-experiments=enable_secure_boot
By adding the additional_experiments
argument with enable_secure_boot
to the google_dataflow_job
resource, the Dataflow job can be deployed on shielded VMs.
resource "google_dataflow_job" "word_count_job" {
name = "sample-dataflow-wordcount-job"
template_gcs_path = "gs://dataflow-templates-europe-west3/latest/Word_Count"
temp_gcs_location = "${google_storage_bucket.bucket.url}/temp"
parameters = {
inputFile = "${google_storage_bucket.bucket.url}/input_file.txt",
output = "${google_storage_bucket.bucket.url}/word_count.txt"
}
additional_experiments = [
"enable_secure_boot"
]
}
Upvotes: 1