Daniel Bertok
Daniel Bertok

Reputation: 11

Adding spark-r job to dataproc workflow template

I've tried to add a spark-r job step to my workflow template in two different ways.

Using a gcloud command:

gcloud beta dataproc workflow-templates add-job spark-r gs://path/to/script.R \
    --step-id=<stepid> --workflow-template=<templateid>

Or by importing a YAML definition:

jobs:
- sparkRJob:
    mainRFileUri: gs://path/to/script.R
  stepId: <stepid>
placement:
  managedCluster:
    clusterName: cluster-sparkr
    config:
      gceClusterConfig:
        zoneUri: europe-west4-b
      masterConfig:
        machineTypeUri: n1-standard-4
      workerConfig:
        machineTypeUri: n1-standard-4
        numInstances: 4

However both ways result in the following error:

INVALID_ARGUMENT: Job "" must provide a job definition

Which leaves me slightly confused as to what exactly am I missing.

Upvotes: 1

Views: 475

Answers (1)

Igor Dvorzhak
Igor Dvorzhak

Reputation: 4457

I have tested your YAML definition and it worked for me with command:

gcloud beta dataproc workflow-templates instantiate-from-file --file <definition.yaml>

Also, workflow template with Spark R job successfully created using gcloud commands:

gcloud beta dataproc workflow-templates create my-test-wf-01
gcloud beta dataproc workflow-templates add-job spark-r gs://path/to/script.R \
    --step-id=my-test-step-id --workflow-template=my-test-wf-01

Output of the 2nd command above:

createTime: '2019-04-15T16:49:06.346Z'
id: my-test-wf-01
jobs:
- sparkRJob:
    mainRFileUri: gs://path/to/script.R
  stepId: my-test-step-id

Upvotes: 1

Related Questions