Reputation: 11
I've tried to add a spark-r
job step to my workflow template in two different ways.
Using a gcloud command:
gcloud beta dataproc workflow-templates add-job spark-r gs://path/to/script.R \
--step-id=<stepid> --workflow-template=<templateid>
Or by importing a YAML definition:
jobs:
- sparkRJob:
mainRFileUri: gs://path/to/script.R
stepId: <stepid>
placement:
managedCluster:
clusterName: cluster-sparkr
config:
gceClusterConfig:
zoneUri: europe-west4-b
masterConfig:
machineTypeUri: n1-standard-4
workerConfig:
machineTypeUri: n1-standard-4
numInstances: 4
However both ways result in the following error:
INVALID_ARGUMENT: Job "" must provide a job definition
Which leaves me slightly confused as to what exactly am I missing.
Upvotes: 1
Views: 475
Reputation: 4457
I have tested your YAML definition and it worked for me with command:
gcloud beta dataproc workflow-templates instantiate-from-file --file <definition.yaml>
Also, workflow template with Spark R job successfully created using gcloud commands:
gcloud beta dataproc workflow-templates create my-test-wf-01
gcloud beta dataproc workflow-templates add-job spark-r gs://path/to/script.R \
--step-id=my-test-step-id --workflow-template=my-test-wf-01
Output of the 2nd command above:
createTime: '2019-04-15T16:49:06.346Z'
id: my-test-wf-01
jobs:
- sparkRJob:
mainRFileUri: gs://path/to/script.R
stepId: my-test-step-id
Upvotes: 1