Reputation: 41
We're using a dbt multi-repo setup with different projects for different business areas. We have several projects, something like this:
dbt_dwh
dbt_project1
dbt_project2
The dbt_dwh
project contains models which we plan to reference in projects 1 and 2 (we have ~10 projects that would reference the dbt_dwh
project) by way of installing git packages. Ideally, we'd like to be able to just reference the models in the dbt_dwh
project (e.g.
SELECT * FROM {{ ref('dbt_dwh', 'model_1') }}
). However, each of our projects sits in it's own database schema and this causes issue upon dbt run because dbt uses the target schema from dbt_project_x, where these objects don't exist. I've included example set-up info below, for clarity.
packages.yml file for dbt_project1
:
packages:
- git: https://git/repo/url/here/dbt_dwh.git
revision: master
profiles.yml for dbt_dwh
:
dbt_dwh:
target: dwh_dev
outputs:
dwh_dev:
<config rows here>
dwh_prod:
<config rows here>
profiles.yml for dbt_project1
:
dbt_project1:
target: project1_dev
outputs:
project1_dev:
<config rows here>
project1_prod:
<config rows here>
sf_orders.sql in dbt_dwh
:
{{
config(
materialized = 'table',
alias = 'sf_orders'
)
}}
SELECT * FROM {{ source('salesforce', 'orders') }} WHERE uid IS NOT NULL
revenue_model1.sql in dbt_project1
:
{{
config(
materialized = 'table',
alias = 'revenue_model1'
)
}}
SELECT * FROM {{ ref('dbt_dwh', 'sf_orders') }}
My expectation here was that dbt would examine the sf_orders
model and see that the default schema for the project it sits in (dbt_dwh
) is dwh_dev
, so it would construct the object reference as dwh_dev.sf_orders
.
However, if you use command dbt run -m revenue_model1
then the default dbt behaviour is to assume all models are located in the default target for dbt_project1
, so you get something like:
11:05:03 1 of 1 START sql table model project1_dev.revenue_model1 .................... [RUN]
11:05:04 1 of 1 ERROR creating sql table model project1_dev.revenue_model1 ........... [ERROR in 0.89s]
11:05:05
11:05:05 Completed with 1 error and 0 warnings:
11:05:05
11:05:05 Runtime Error in model revenue_model1 (folder\directory\revenue_model1.sql)
11:05:05 404 Not found: Table database_name.project1_dev.sf_orders was not found
I've got several questions here:
ref
function?dbt_dwh
project when this Git repo is installed as a package in another project?Some points to note:
source.yml
files in each of the dbt projects to reference the output objects of the dbt_dwh
project, this feels like duplication of effort and could result in different versions of the same sources.yml file across projectsschema
in the dbt config block, but this removes our ability to test in dev environment/schema for dbt_dwh
project
Upvotes: 2
Views: 7756
Reputation: 98
I had a similar use case as you. In our case, we could to explicitly define the +schema
for the upstream packages instead of the downstream one. Something like this:
dbt_dwh:
dbt_project.yml
models:
dbt_dwh:
marts:
+schema: *[define these here]*
...
then in our downstream packages, we could simply use ref like:
ref('dbt_dwh', 'mart_my_table')
Of course, the answer from the original author also works.
Upvotes: 2
Reputation: 1
I faced this same issue today.
After few hours reading and trying stuff I made it work by specifying the database (GCP project) in the config block in the first model (the one living in dbt_dwh project I guess).
It should look like this:
{{
config(
materialized = 'table',
alias = 'sf_orders',
database = 'dbt_dwh',
schema = 'bq_dataset_name' # Not sure if this one is needed but worth trying
)
}}
That worked for me. Maybe you can also try specifying the database in one of the schema.yml files or even the dbt_project.yml file or profiles.yml file.
Upvotes: 0
Reputation: 41
I managed to find a solution so I'll answer my own question in case anybody else runs up against the same issue. Unfortunately this is not documented anywhere that I can find, however, a throw-away comment in the dbt Slack workspace sparked an idea that allowed me to find the/a solution (I'll post the message if I manage to find it again, to give credit where it's due).
To fix this is actually very simple, you just need to add the project being imported to your dbt_project.yml
file and specify the schema. For our use case this is fine as we only have 1 schema we use.
dbt_project.yml for dbt_project1:
models:
db_project_1:
outputs:
project1_dev:
<configs here>
project1_prod:
<configs here>
dbt_dwh:
+schema: [[schema you want these models to run into]]
<configs here>
The advantages with this approach are:
dbt run -m +model_name --exclude dbt_dwh
(for example) to prevent models in your upstream project from running.I haven't yet figured out if it is possible to use the default parameters/settings for models inside the upstream project (in this case dbt_dwh
) but I will edit this answer if I find a way.
Upvotes: 1