Rolintocour
Rolintocour

Reputation: 3168

Customize Dataprod workflows

I use workflows with Dataproc. There are 3 things I'd like to do:

Is there a way to achieve those? Thanks.

Upvotes: 2

Views: 65

Answers (2)

Igor Dvorzhak
Igor Dvorzhak

Reputation: 4465

You could be better off using a more generic orchestration solution - Cloud Composer (managed Apache Airflow) instead of Dataproc Workflows. It has all the features that you need and supports Dataproc too.

Upvotes: 0

tix
tix

Reputation: 2158

Thanks for reaching out. We intentionally didn't implement some features until we had clear demand.

I would suggest filing a feature request for #1 and #2 with a use case at [1].

Supporting job retries (via Restartable Jobs) or adding policies like proceed-on-failure in Workflows seem reasonable.

I am not sure what you're requesting in #3 (which scheduler)? Cloud Functions are triggered via HTTP requests, files in GCS or PubSub notifications. You should be able to use pyspark with a client library to trigger via either of these paths.

[1] https://cloud.google.com/support/docs/issue-trackers

Upvotes: 2

Related Questions