recyclinguy
recyclinguy

Reputation: 99

Creating dataflow classic template to orchestrate the job via DataflowflowTemplatedJobOperator

I am trying to create and stage Dataflow Classic template . Following the document in the link provided below -

https://cloud.google.com/dataflow/docs/guides/templates/creating-templates#java_8 .

mvn compile exec:java \ -Dexec.mainClass=com.example.myclass \ -Dexec.args="--runner=DataflowRunner \ --project=PROJECT_ID \ --stagingLocation=gs://BUCKET_NAME/staging \ --templateLocation=gs://BUCKET_NAME/templates/TEMPLATE_NAME --region=REGION"

Composer

start_job = DataflowTemplatedJobStartOperator( task_id="start_job", template='gs://bucket/latest/job1', parameters={'inputFile': API END POINT, 'output': GCS_OUTPUT}, location='REGION',

My understanding is first I will have to use maven compile and deploy the template and next use json type for passing the parameters to Dataflow templated operator

I am not sure how the parameters and template has to be created . I have manually created dataflow jar file and deployed it to GCS bucket in the past. From the document it looks like the maven compile has to be used to compile and deploy the template and pass the parameter to the job with dataflow java operator .But now with the new design I suppose I have to pass the parameter via Dataflow Templated Job Start Operator.

Has anyone used the Templated method to orchestrate custom Java dataflow job and if yes how is the json file used to pass the arguments. Any example will be very helpful . Currently I am compiling the binary from my local machine and uploading it on bucket.

Appreciate any suggestion to solve the issue.

Regards

Upvotes: 1

Views: 1654

Answers (2)

Prajna Rai T
Prajna Rai T

Reputation: 1810

You can follow this quickstart to deploy Dataflow classic template using mvn compile

You don’t have to store parameters in the json file, you can add parameters as given below using DataflowTemplatedJobStartOperator. For more information you can refer to this document.

start_template_job = DataflowTemplatedJobStartOperator(
    task_id="start-template-job",
    template='gs://dataflow-templates/latest/Word_Count',
    parameters={'inputFile': "gs://dataflow-samples/shakespeare/kinglear.txt", 'output': GCS_OUTPUT},
    location='europe-west3',
)

It is not possible to run MVN just to package and send the jar file to the staging without running the pipeline.

Upvotes: 2

recyclinguy
recyclinguy

Reputation: 99

I created the Template File from bitbucket repo and created the template in GCS.

mvn compile exec:java \
-Dexec.mainClass=com.google.cloud.teleport.templates.<template-class> \
-Dexec.cleanupDaemonThreads=false \
-Dexec.args=" \
--project=<project-id> \
--stagingLocation=gs://<bucket-name>/staging \
--tempLocation=gs://<bucket-name>/temp \
--templateLocation=gs://<bucket-name>/templates/<template-name>.json \
--runner=DataflowRunner"

Maven also required extra parameters to create the template .

Question - Do I need to Create json file for passing parameters to template - ( ie via Create a separate folder in the template location ) . I think I will be duplicating the parameters again in json file

how do i to call the json file in dataflow operator- What goes in parameter

DataflowTemplatedJobStartOperator

task_id="start_job",
template='gs://template location', 
parameters={ what should be the  parameters ? should it point to 
           json file},
location= xxx

Another question I have is - When we deploy to upper environment especially UAT and PROD, we are not allow to run the pipeline until the allocated schedule time arrives. So the question is, Is that possible to run MVN just to package and send the jar file to the staging without running the pipeline

Regards

Upvotes: 0

Related Questions