Reputation: 1205
I'm currently using airflow on Amazon Web services using EC2 instances. The big issue is that the average usage of the instances are about 2%...
I'd like to use a scalable architecture and creating instances only for the duration of the job and kill it. I saw on the roadmap that AWS BATCH was suppose to be an executor in 2017 but no new about that.
Do you know if it possible to use AWS BATCH as an executor for all airflow jobs ?
Regards, Romain.
Upvotes: 24
Views: 6551
Reputation: 955
I found this repository in my case is working quite well https://github.com/aelzeiny/airflow-aws-executors I'm using Batch jobs with FARGATE_SPOT with computation engine.
I'm just struggling with the logging on AWS CloudWatch and the return status in AWS batch but from Airflow perspective is working
Upvotes: 1
Reputation: 1190
There is no executor, but an operator is available from version 1.10. After you create an Execution Environment, Job Queue and Job Definition on AWS Batch, you can use the AWSBatchOperator
to trigger Jobs.
Here is the source code.
Upvotes: 7
Reputation: 4366
You would need to create a custom Executor (extended from BaseExecutor
) capable of submitting and monitoring the AWS Batch jobs. Also may need to create a custom Docker image for the instances.
Upvotes: 1
Reputation: 45391
Currently there is a SequentialExecutor, a LocalExecutor, a DaskExecutor, a CeleryExecutor and a MesosExecutor. I heard they're working on AIRFLOW-1899 targeted for 2.0 to introduce a KubernetesExecutor. So, looking at Dask and Celery it doesn't seem they support a mode where their workers are created per task. Mesos might, Kubernetes should, but then you'd have to scale the clusters for the workers accordingly to account for turning off the nodes when un-needed.
We did a little work to get a cloud formation setup where celery workers scale out and in based on metrics from cloud-watch of the average cpu load across the tagged workers.
Upvotes: 4