Andrew
Andrew

Reputation: 6860

Is there a way to get the parameters that were passed to a GCP Dataflow job from the CLI/API

I have tried the describe command listed here and I don't see the parameters. Is there another command that I should use to get this information, or some other API that would provide it?

Upvotes: 1

Views: 2181

Answers (1)

Tuxdude
Tuxdude

Reputation: 49473

TL;DR - You're missing the --full argument to the gcloud dataflow jobs describe command.

FLAGS

--full

Retrieve the full Job rather than the summary view

View full job info

If you're using gcloud to view the information about the GCP Dataflow job, this command will show the full info (which is actually quite a lot of info) about the job including any parameters that were passed to the job:

gcloud dataflow jobs describe JOB_ID --full

All the options are under the hierarchy environment.sdkPipelineOptions.options

View all options as JSON

To view all the options passed to the job (which prints actually more than just the command line arguments BTW) as a JSON, you can do:

$ gcloud dataflow jobs describe JOB_ID --full --format='json(environment.sdkPipelineOptions.options)'
{
  "environment": {
    "sdkPipelineOptions": {
      "options": {
        "apiRootUrl": "https://dataflow.googleapis.com/",
        "appName": "WordCount",
        "credentialFactoryClass": "com.google.cloud.dataflow.sdk.util.GcpCredentialFactory",
        "dataflowEndpoint": "",
        "enableCloudDebugger": false,
        "enableProfilingAgent": false,
        "firstArg": "foo",
        "inputFile": "gs://dataflow-samples/shakespeare/kinglear.txt",
        "jobName": "wordcount-tuxdude-12345678",
        "numberOfWorkerHarnessThreads": 0,
        "output": "gs://BUCKET_NAME/dataflow/output",
        "pathValidatorClass": "com.google.cloud.dataflow.sdk.util.DataflowPathValidator",
        "project": "PROJECT_NAME",
        "runner": "com.google.cloud.dataflow.sdk.runners.DataflowPipelineRunner",
        "secondArg": "bar",
        "stableUniqueNames": "WARNING",
        "stagerClass": "com.google.cloud.dataflow.sdk.util.GcsStager",
        "stagingLocation": "gs://BUCKET_NAME/dataflow/staging/",
        "streaming": false,
        "tempLocation": "gs://BUCKET_NAME/dataflow/staging/"
      }
    }
  }
}

View all options as a table

$ gcloud dataflow jobs describe JOB_ID --full --format='flattened(environment.sdkPipelineOptions.options)'
environment.sdkPipelineOptions.options.apiRootUrl:                   https://dataflow.googleapis.com/
environment.sdkPipelineOptions.options.appName:                      WordCount
environment.sdkPipelineOptions.options.credentialFactoryClass:       com.google.cloud.dataflow.sdk.util.GcpCredentialFactory
environment.sdkPipelineOptions.options.dataflowEndpoint:
environment.sdkPipelineOptions.options.enableCloudDebugger:          False
environment.sdkPipelineOptions.options.enableProfilingAgent:         False
environment.sdkPipelineOptions.options.firstArg:                     foo
environment.sdkPipelineOptions.options.inputFile:                    gs://dataflow-samples/shakespeare/kinglear.txt
environment.sdkPipelineOptions.options.jobName:                      wordcount-tuxdude-12345678
environment.sdkPipelineOptions.options.numberOfWorkerHarnessThreads: 0
environment.sdkPipelineOptions.options.output:                       gs://BUCKET_NAME/dataflow/output
environment.sdkPipelineOptions.options.pathValidatorClass:           com.google.cloud.dataflow.sdk.util.DataflowPathValidator
environment.sdkPipelineOptions.options.project:                      PROJECT_NAME
environment.sdkPipelineOptions.options.runner:                       com.google.cloud.dataflow.sdk.runners.DataflowPipelineRunner
environment.sdkPipelineOptions.options.secondArg:                    bar
environment.sdkPipelineOptions.options.stableUniqueNames:            WARNING
environment.sdkPipelineOptions.options.stagerClass:                  com.google.cloud.dataflow.sdk.util.GcsStager
environment.sdkPipelineOptions.options.stagingLocation:              gs://BUCKET_NAME/dataflow/staging/
environment.sdkPipelineOptions.options.streaming:                    False
environment.sdkPipelineOptions.options.tempLocation:                 gs://BUCKET_NAME/dataflow/staging/

Get the value of just a single option

To get the value of just a single option named --argName (whose value BTW is MY_ARG_VALUE), you can do:

$ gcloud dataflow jobs describe JOB_ID --full --format='value(environment.sdkPipelineOptions.options.argName)'
MY_ARG_VALUE

gcloud formatting

gcloud in general supports a wide range of formatting options in the output which is applicable to most gcloud commands which pull info from the server. You can read about them here.

Upvotes: 4

Related Questions