Amit Nahar
Amit Nahar

Reputation: 73

How to pass and access Dataproc Sparkjob args while instantiating the Dataproc Workflow Template

I am using Dataproc Workflow Template to run the Sparkjob. I want to pass the input file dynamically to the Sparkjob args while instantiating it through Dataproc Workflow Template. How I can achieve it?

Upvotes: 2

Views: 512

Answers (1)

Dagang Wei
Dagang Wei

Reputation: 26548

See Parameterization of Dataproc Workflow Templates.

Example template (my_template.yaml):

...
jobs:
  - stepId: job1
    sparkJob:
      ...
      args:
      - 'input file URI'
      - 'output directory'
parameters:
- name: INPUT_FILE
  fields:
  - jobs['job1'].sparkJob.args[0]
- name: OUTPUT_DIR
  fields:
  - jobs['job1'].sparkJob.args[1]

Create/import the template:

gcloud dataproc workflow-templates import my-template \
    --region=<region> \
    --source=my_template.yaml

Instantiate the template with args:

gcloud dataproc workflow-templates instantiate my-template \
    --region=<region> \
    --parameters=INPUT_FILE=gs://my-bucket/test.txt,OUTPUT_DIR=gs://my-bucket/output/

Upvotes: 2

Related Questions