Reputation: 285
My aim is to be able to have "dynamic" sources depending on the type of DBT run I am doing. To be more precise, I am trying to find a solution to perform end-to-end business testing of our DBT models. I don't mean schema or simple data tests, but business logic tests. Something like, I have some input tables with test data, I run the DBT models and then I assert the end tables contain the desired results. I can create all the target tables in a separate schema by using a different 'test' profile, but I still need to be able to select from a different set of sources, which would be the test tables I am creating with the test data.
I guess I can use jinja in the source files in combination with some variables to make this happen, but I am wondering if there is an even better way where I can do it without changing the source files at all. Like, the developers would not have to worry about writing code that also works for the tests. For this purpose, I was wondering if we can override the source macro, or do something along these lines, to incorporate this behaviour - similar to when we override the generate_schema_name
macro. Something along the lines of (in python pseudocode):
def source(schema_name, table_name):
if env('is_test') == true:
return schema_name + table_name + '_test'
else:
return schema_name + table_name
I guess the complexity here is also that the source macro does more than that, for example sets some info for the lineage for the docs, which I definitely would like to keep.
Any suggestion outside of this method are more than welcome!
Upvotes: 9
Views: 11627
Reputation: 181
Credit to @tconbeer's answer above for the inspiration for using an inline if
statement to define source credentials. I ended up using the below to help define where sources should come from depending on the profile being used.
sources:
- name: source_name
database: "{{ 'sandbox_user'if target.name == 'sandbox' else 'prod' }}"
schema: "{{ 'sandbox_schema' if target.name == 'sandbox' else 'schema' }}"
Upvotes: 1
Reputation: 5805
You can make your sources "dynamic" (i.e., configurable at runtime) using jinja in your sources.yml
file itself, without writing any new macros. dbt doesn't allow macros or other complex jinja in .yml
files, but you can access var()
, env_var()
, target
, and use simple jinja conditionals to achieve this. You have to quote the jinja in your .yml
file so the syntax doesn't collide with .yml
syntax.
unit_test
targetI would use a different target for unit tests (what I'm calling the kind of business logic tests you're looking for). Let's call it unit_test
. So in your profiles.yml
:
profile: my_project_profile
target: dev # or unit_test if you want that to be the default
outputs:
- name: unit_test # accesses unit test data
...
- name: dev # accesses prod data
...
- name: prod
...
By default, a source's name is the schema it selects from. However, you can override that behavior by using the schema
key. So your sources.yml
file becomes:
version: 2
sources:
- name: my_first_source
schema: "{{ 'unit_test_source' if target.name == 'unit_test' else 'my_first_source' }}"
tags: ["has_unit_tests"]
tables:
- name: my_first_table
...
If you want to keep the unit test data in the same schema as the real data, you can use this same trick to change the source table's name at runtime:
version: 2
sources:
- name: my_first_source
tables:
- name: my_first_table
tags: ["has_unit_tests"]
identifier: "{{ 'my_first_table_test' if target.name == 'unit_test' else 'my_first_table' }}"
...
You can then run dbt against your unit_test
target; if you don't have unit test data built out for all models, you can select a subset of your dag to test using tags (I've added the has_unit_tests
tag to the source above)
$ dbt build -t unit_test -s tag:has_unit_tests
var
or env_var
You can also access var
and env_var
. You could use these variables to either define the test relation identifiers, or just set a flag for is_unit_test
. The first option, using env_var
and setting a default value for when the environment variable is unset:
version: 2
sources:
- name: my_first_source
schema: "{{ env_var('DBT_MY_FIRST_SOURCE_TEST_SCHEMA', 'my_first_source') }}"
tags: ["has_unit_tests"]
tables:
- name: my_first_table
...
Then to run your tests:
$ export DBT_MY_FIRST_SOURCE_TEST_SCHEMA=my_first_source_test
$ dbt build -s tag:has_unit_tests
Upvotes: 3
Reputation: 256
Not positive if this is what you were looking for, but I've done some work with dynamic sources before. I needed to union 5 tables that lived in 5 separate schemas but had the same table name. I hope what I provide can at least get you moving in the right direction.
You have two tables:
my_database.my_schema.tableA
my_database.my_schema.tableA_test
So your sources file looks something like this:
-- src.yml
sources:
- name: my_schema
database: my_database
tables:
- name: tableA
- name: tableA_test
You asked if there was any way to override existing macros like the generate macros. Unfortunately, I personally did not implement it that way for my situation because I didn't want to apply it to a larger scale (just a small portion of models).
Then I have my model:
-- my_model.sql
{%- set tbl = get_source(table_name) -%}
select
'{{tbl}}' as source,
column1,
column2,
column3
from {{ source( 'my_schema' , tbl ) }} i
and that model calls on this macro to adjust the table name:
-- my_macro.sql
{% macro get_source(table_name) %}
{% if target.name = 'test' %}
{% set output = table_name ~ '_test' %}
{% else %}
{% set output = table_name %}
{% endif %}
{{ return(output) }}
{% endmacro %}
I have not tested this exactly, but I've gotten some version of dynamic schemas to work, so there is potential here. I may have misunderstood the ask too, so let me know if this is completely off.
Upvotes: 1
Reputation: 316
My opinion that you should have separate database(s) for testing, plus having a variable for configuring the database name in dbt_project.yml
file
So when to run the tests, you should be able to use --vars
option.
Upvotes: 1