Reputation: 45
We are using Dataflow Flex Templates and following this guide (https://cloud.google.com/dataflow/docs/guides/templates/using-flex-templates) to stage and launch jobs. This is working in our environment. However, when I SSH onto the Dataflow VM and run docker ps
I see it is referencing the a different docker image to the one we speccify in our template (underlined in green):
The template I am launching from is as follows and jobs are created using gcloud beta dataflow flex-template run
:
{
"image": "gcr.io/<MY PROJECT ID>/samples/dataflow/streaming-beam-sql:latest",
"metadata": {
"description": "An Apache Beam streaming pipeline that reads JSON encoded messages from Pub/Sub, uses Beam SQL to transform the message data, and writes the results to a BigQuery",
"name": "Streaming Beam SQL",
"parameters": [
{
"helpText": "Pub/Sub subscription to read from.",
"label": "Pub/Sub input subscription.",
"name": "inputSubscription",
"regexes": [
".*"
]
},
{
"helpText": "BigQuery table spec to write to, in the form 'project:dataset.table'.",
"is_optional": true,
"label": "BigQuery output table",
"name": "outputTable",
"regexes": [
"[^:]+:[^.]+[.].+"
]
}
]
},
"sdkInfo": {
"language": "JAVA"
}
}
So I would expect the output of docker ps
to show gcr.io/<MY PROJECT ID>/samples/dataflow/streaming-beam-sql
as the image on Dataflow. When I launch the image from GCR to run on a GCE instance I get the following output when running docker ps
:
Should I expect to see the name of the image I have referenced in the Dataflow template on the Dataflow VM? Or have I missed a step somewhere?
Thanks!
Upvotes: 2
Views: 1717
Reputation: 172
TLDR; You are looking in the worker VM instead of launcher VM.
In case of flex templates, when you run the job, it first creates a launcher VM where it pulls your container and runs it to generate the job graph. This VM will destroyed after this step is completed. Then the worker VM is started to actually run the generated job graph. In the worker VM there is no need for your container. Your container is used only to generate the job graph based on the parameters passed.
In your case, you are trying to search for your image in the worker VM. The launcher VM is short lived and starts with launcher-*********************. If you SSH into that VM and do docker ps
you will be able to see your container image.
Upvotes: 5