user1502505
user1502505

Reputation: 884

Apache Airflow - Multiple deployment environments

When handling multiple environments (such as Dev/Staging/Prod etc) having separate (preferably identical) Airflow instances for each of these environments would be the best case scenario.

I'm using the GCP managed Airflow ( GCP Cloud Composer), which is not very cheap to run, and having multiple instances would increase our monthly bill significantly.

So, I'd like to know if anyone has recommendations on using a single Airflow instance to handle multiple environments?

One approach I was considering of was to have separate top-level folders within my dags folder corresponding to each of the environment (i.e. dags/dev, dags/prod etc) and copy my DAG scripts to the relevant folder through the CI/CD pipeline.

So, within my source code repository if my dag looks like:
airflow_dags/my_dag_A.py

During the CI stage, I could have a build step that creates 2 separate versions of this file:
airflow_dags/DEV/my_dag_A.py
airflow_dags/PROD/my_dag_A.py

I would follow a strict naming convention for naming my DAGs, Airflow Variables etc to reflect the environment name, so that the above build step can automatically rename those accordingly.

I wanted check if this is an approach others may have used? Or are there any better/alternative suggestions?

Please let me know if any additional clarifications are needed.
Thank you in advance for your support. Highly appreciated.

Upvotes: 1

Views: 2183

Answers (1)

Mazlum Tosun
Mazlum Tosun

Reputation: 6572

I think it can be a good approach to have a shared environement because it's cost effective.

However if you have a Composer cluster per environment, it's simpler to manage, and it's allows having a better separation.

If you stay on a shared environment, I think you are on the good direction with a separation on the Composer bucket DAG and a folder per environment.

If you use Airflow variables, you also have to deal with environment in addition to the DAGs part.

You can then manage the access to each folder in the bucket.

In my team, we chose another approach. Cloud Composer uses GKE with autopilot mode and it's more cost effective than the previous version.

It's also easier to manage the environement size of the cluster and play with differents parameters (workers, cpu, webserver...).

In our case, we created a cluster per environment but we have a different configuration per environment (managed by Terraform):

  • For dev and uat envs, we have a little sizing and an environment size as small
  • For prod env, we have a higher sizing and an environment size as Medium

It's not perfect but this allows us to have a compromise between cost and separation.

Upvotes: 4

Related Questions