Reputation: 45
I'm running Airflow 2.2.1 on Docker via the provided official image. Now, the documentation says that:
Airflow comes with an SQLite backend by default. This allows the user to run Airflow without any external database. However, such a setup is meant to be used for testing purposes only; running the default setup in production can lead to data loss in multiple scenarios. If you want to run production-grade Airflow, make sure you configure the backend to be an external database such as PostgreSQL or MySQL.
However, the docker-compose.yaml given in the documentation (see Running Airflow in Docker) makes no mention of SQLite. In fact, it has defines (among other things):
AIRFLOW__CORE__EXECUTOR: CeleryExecutor
AIRFLOW__CORE__SQL_ALCHEMY_CONN: postgresql+psycopg2://airflow:airflow@postgres/airflow
AIRFLOW__CELERY__RESULT_BACKEND: db+postgresql://airflow:airflow@postgres/airflow
So now I'm confused. Why does the documentation claim SQLite is the default backend? Is the default backend not a Postgres database? Be as it may, should I be happy using the Postgres database provided here for production use, or should I set up my own custom db as a backend as the docs suggest (see here)? In fact, I don't quite even understand how the default backend is a Postgres DB - does this also run inside a separate container?
Upvotes: 0
Views: 2103
Reputation: 18854
Airflow (without the docker-compose file or Helm Chart) uses SQLite by default.
However, the official docker-compose creates a Postgres container and uses that so that you can run tasks in parallel.
Similarly, the official Helm Chart also creates a Postgres service for quick testing.
However, for the Production use case you should not use a Database running inside a container as containers are most popularly used for ephemeral storage/tasks. Hence for Production, you should use a managed Database like Cloud SQL (on GCP). Details: https://airflow.apache.org/docs/helm-chart/stable/production-guide.html#database
Upvotes: 4