Reputation: 3168
I use workflows with Dataproc. There are 3 things I'd like to do:
instantiate a single workflow step. Sometime a workflow crashes, and I don't want to run the whole workflow again, only at a / from a given step
parameters are limited. Sometimes there are URL templates that I'd like to define in workflow, a parameter being only a part of it.
jobs:
- sparkJob:
args:
- --myarg
- gs://base-url/the-param-I-want-to-parametrize.csv
from a workflow, I'd like to disable a task in the scheduler, and also call a Cloud Function, is this possible?
Is there a way to achieve those? Thanks.
Upvotes: 2
Views: 65
Reputation: 4465
You could be better off using a more generic orchestration solution - Cloud Composer (managed Apache Airflow) instead of Dataproc Workflows. It has all the features that you need and supports Dataproc too.
Upvotes: 0
Reputation: 2158
Thanks for reaching out. We intentionally didn't implement some features until we had clear demand.
I would suggest filing a feature request for #1 and #2 with a use case at [1].
Supporting job retries (via Restartable Jobs) or adding policies like proceed-on-failure in Workflows seem reasonable.
I am not sure what you're requesting in #3 (which scheduler)? Cloud Functions are triggered via HTTP requests, files in GCS or PubSub notifications. You should be able to use pyspark with a client library to trigger via either of these paths.
[1] https://cloud.google.com/support/docs/issue-trackers
Upvotes: 2