Nutel
Nutel

Reputation: 2294

Using airflow for real time job orchestration

I have an application that runs as a web service, which submits jobs to Spark on a user request. A job queue needs to be limited per user. I am planning to use Airflow as an orchestration framework to manage job queues but while it supports parallel DAG execution it's optimized for batch processing rather than real time. Is Airflow designed to handle ~200 DAG executions per second with multiple queues (one per user) or should I look for alternatives?

Upvotes: 6

Views: 9889

Answers (1)

Chengzhi
Chengzhi

Reputation: 2591

Do you have data move from one task to another? Does time matter here since you mentioned real-time. With Airflow, workflows are expected to be mostly static or slowly changing. Mostly for ETL batch processing, you can speed up the airflow heartbeat, but would be good to have a POC with your use case to test out.
Below is from Airflow official document: https://airflow.apache.org/#beyond-the-horizon

Airflow is not a data streaming solution. Tasks do not move data from one to the other (though tasks can exchange metadata!). Airflow is not in the Spark Streaming or Storm space, it is more comparable to Oozie or Azkaban

Upvotes: 5

Related Questions