Reputation: 175
I have multiple google-dataflow jobs for data collection and ETL purposes. and then google dataproc job (Spark) for further machine learning.
I would like tie these jobs together like workflow then i should be able schedule the whole workflow.
do you have some suggestion/products which can help me ?
Upvotes: 2
Views: 1133
Reputation: 175
We have implemented 2 approaches for this...
Custom solution for invoking dataproc jobs. This include Spring scheduler to invoke Dataproc & dataflow using google Sdk API
One dataproc jobs running in streaming mode and this streaming mode dataproc jobs manages other dataproc and dataflow jobs. We send the message to pub-sub and streaming mode receive the message and then invoke further chain.
I will prefer 2nd solution over 1st because we have manage Spring application using cloud formation etc
2nd solution comes with extra cost of running dataproc jobs for 24*7.
Upvotes: 0
Reputation: 4041
I don't know of any great answers on GCP right now, but here's a couple of options:
Upvotes: 1