Andrew
Andrew

Reputation: 6860

Can Airflow run streaming GCP Dataflow jobs?

I am looking for orchestration software for streaming GCP Dataflow jobs - something that can provide alerting, status, job launching etc. akin to what this does on Kubernetes. The answer here suggests Airflow as they have some hooks into GCP - this would be nice because we have some other infrastructure that runs on Airflow. However I am not sure if this would be able to handle streaming jobs - my understanding is that Airflow is designed for tasks that will complete, which is not the case for a streaming job. Is Airflow appropriate for this? Or is there different software I should use?

Upvotes: 4

Views: 1523

Answers (1)

MANISH ZOPE
MANISH ZOPE

Reputation: 1201

Its probably late, but answering for people who visit this topic in future.

Yes you can definitely run dataflow streaming job from airflow. Use airflow version 1.9 or above.

Link : https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/hooks/gcp_dataflow_hook.py

https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/operators/dataflow_operator.py

You dont need to put extra efforts for running streamin job. Above Dataflow operators run both batch and streaming jobs. It mark the airflow task successful as soon as dataflow streaming job start running (i.e. job is in running state)

Upvotes: 3

Related Questions