no-stale-reads
no-stale-reads

Reputation: 358

Use Google Cloud Workflows to trigger Dataproc Batch job

My scenario demands an orchestration since the jobs in a flow (say a DAG) are connected/codependent. Cloud Composer is too expensive since we only have a few jobs to run (does not worth it).

I've been looking around and looks like Google Cloud Workflows can help me on orchestrating my workflows/DAGs.

But I couldn't be able to find any documentation or example where I can trigger a Dataproc Batch job from the Worklows YAML file.

Triggering a Function that will trigger a Dataproc batch job using the SDK is not an option since (as I said) I need to control the end of a task to be able to start a different one. Using Functions I wouldn't be able to have such control.

Do you have any idea on how to (and if its possible to) create a Dataproc Batch job using a Google Cloud Workflow?

Upvotes: 2

Views: 777

Answers (1)

me_L_coding
me_L_coding

Reputation: 354

yes its possible! Since workflows are capable of http.post requests you can use the REST API for Dataproc batches (here).

Subsequently use http.get to await the execution of the dataproc job by polling its "state" until "SUCCEEDED" or otherwise

Upvotes: 1

Related Questions