Reputation: 73
I am using Dataproc Workflow Template to run the Sparkjob. I want to pass the input file dynamically to the Sparkjob args while instantiating it through Dataproc Workflow Template. How I can achieve it?
Upvotes: 2
Views: 512
Reputation: 26548
See Parameterization of Dataproc Workflow Templates.
Example template (my_template.yaml
):
...
jobs:
- stepId: job1
sparkJob:
...
args:
- 'input file URI'
- 'output directory'
parameters:
- name: INPUT_FILE
fields:
- jobs['job1'].sparkJob.args[0]
- name: OUTPUT_DIR
fields:
- jobs['job1'].sparkJob.args[1]
Create/import the template:
gcloud dataproc workflow-templates import my-template \
--region=<region> \
--source=my_template.yaml
Instantiate the template with args:
gcloud dataproc workflow-templates instantiate my-template \
--region=<region> \
--parameters=INPUT_FILE=gs://my-bucket/test.txt,OUTPUT_DIR=gs://my-bucket/output/
Upvotes: 2