Reputation: 10441
We use airflow to orchestrate our workflows, and dbt with bigquery for our daily transformations in BigQuery. We have two separate git repos, one for our dbt project and a separate one for airflow.
It seems the simplest approach to scheduling our daily run dbt
seems to be a BashOperator
in airflow. However, to schedule DBT to run with Airflow, it seems like our entire DBT project would need to be nested inside of our Airflow project, that way we can point to it for our dbt run
bash command?
Is it possible to trigger our dbt run
and dbt test
without moving our DBT directory inside of our Airflow directory? With the airflow-dbt package, for the dir
in the default_args
, maybe it is possible to point to the gibhub link for the DBT project here?
Upvotes: 7
Views: 4950
Reputation: 10441
Accepted the other answer based on the consensus via upvotes and the supporting comment, however this is a 2nd option we're currently using:
dbt
and airflow
repos / directories are next to each other.docker-compose.yml
, we've added our DBT directory as a volume so that airflow has access to it.Dockerfile
, install DBT and copy our dbt
code.BashOperator
to run dbt
and test dbt
.Upvotes: 3
Reputation: 541
Since you’re on GCP another option that is completely serverless is to run dbt with cloud build instead of airflow. You can also add workflows to that if you want more orchestration. If you want a detailed description there’s a post describing it. https://robertsahlin.com/serverless-dbt-on-google-cloud-platform/
Upvotes: 0
Reputation: 5717
My advice would be to leave your dbt and airflow codebases separated. There is indeed a better way:
DockerOperator
in your airflow DAG to run that docker image with your dbt codeI'm assuming that you use the airflow LocalExecutor here and that you want to execute your dbt run
workload on the server where airflow is running. If that's not the case and that you have access to a Kubernetes cluster, I would suggest instead to use the KubernetesPodOperator
.
Upvotes: 11