Catazza
Catazza

Reputation: 285

Overriding source macro in DBT to allow for dynamic sources for test runs

My aim is to be able to have "dynamic" sources depending on the type of DBT run I am doing. To be more precise, I am trying to find a solution to perform end-to-end business testing of our DBT models. I don't mean schema or simple data tests, but business logic tests. Something like, I have some input tables with test data, I run the DBT models and then I assert the end tables contain the desired results. I can create all the target tables in a separate schema by using a different 'test' profile, but I still need to be able to select from a different set of sources, which would be the test tables I am creating with the test data.

I guess I can use jinja in the source files in combination with some variables to make this happen, but I am wondering if there is an even better way where I can do it without changing the source files at all. Like, the developers would not have to worry about writing code that also works for the tests. For this purpose, I was wondering if we can override the source macro, or do something along these lines, to incorporate this behaviour - similar to when we override the generate_schema_name macro. Something along the lines of (in python pseudocode):

def source(schema_name, table_name):
    if env('is_test') == true:
        return schema_name + table_name + '_test'
    else:
        return schema_name + table_name

I guess the complexity here is also that the source macro does more than that, for example sets some info for the lineage for the docs, which I definitely would like to keep.

Any suggestion outside of this method are more than welcome!

Upvotes: 9

Views: 11627

Answers (4)

CallumO
CallumO

Reputation: 181

Credit to @tconbeer's answer above for the inspiration for using an inline if statement to define source credentials. I ended up using the below to help define where sources should come from depending on the profile being used.

sources:
  - name: source_name
    database: "{{ 'sandbox_user'if target.name == 'sandbox' else 'prod' }}"
    schema: "{{ 'sandbox_schema' if target.name == 'sandbox' else 'schema' }}"

Upvotes: 1

tconbeer
tconbeer

Reputation: 5805

You can make your sources "dynamic" (i.e., configurable at runtime) using jinja in your sources.yml file itself, without writing any new macros. dbt doesn't allow macros or other complex jinja in .yml files, but you can access var(), env_var(), target, and use simple jinja conditionals to achieve this. You have to quote the jinja in your .yml file so the syntax doesn't collide with .yml syntax.

Using a unit_test target

I would use a different target for unit tests (what I'm calling the kind of business logic tests you're looking for). Let's call it unit_test. So in your profiles.yml:

profile: my_project_profile
target: dev # or unit_test if you want that to be the default
outputs:
  - name: unit_test # accesses unit test data
    ...
  - name: dev # accesses prod data
    ...
  - name: prod
    ...

By default, a source's name is the schema it selects from. However, you can override that behavior by using the schema key. So your sources.yml file becomes:

version: 2

sources:
  - name: my_first_source
    schema: "{{ 'unit_test_source' if target.name == 'unit_test' else 'my_first_source' }}"
    tags: ["has_unit_tests"]
    tables:
      - name: my_first_table
        ...

If you want to keep the unit test data in the same schema as the real data, you can use this same trick to change the source table's name at runtime:

version: 2

sources:
  - name: my_first_source
    tables:
      - name: my_first_table
        tags: ["has_unit_tests"]
        identifier: "{{ 'my_first_table_test' if target.name == 'unit_test' else 'my_first_table' }}"
        ...

You can then run dbt against your unit_test target; if you don't have unit test data built out for all models, you can select a subset of your dag to test using tags (I've added the has_unit_tests tag to the source above)

$ dbt build -t unit_test -s tag:has_unit_tests

Using var or env_var

You can also access var and env_var. You could use these variables to either define the test relation identifiers, or just set a flag for is_unit_test. The first option, using env_var and setting a default value for when the environment variable is unset:

version: 2

sources:
  - name: my_first_source
    schema: "{{ env_var('DBT_MY_FIRST_SOURCE_TEST_SCHEMA', 'my_first_source') }}"
    tags: ["has_unit_tests"]
    tables:
      - name: my_first_table
        ...

Then to run your tests:

$ export DBT_MY_FIRST_SOURCE_TEST_SCHEMA=my_first_source_test
$ dbt build -s tag:has_unit_tests

Upvotes: 3

anna
anna

Reputation: 256

Not positive if this is what you were looking for, but I've done some work with dynamic sources before. I needed to union 5 tables that lived in 5 separate schemas but had the same table name. I hope what I provide can at least get you moving in the right direction.

You have two tables:

my_database.my_schema.tableA

my_database.my_schema.tableA_test

So your sources file looks something like this:

-- src.yml

sources:
  - name: my_schema
    database: my_database
    tables:
      - name: tableA
      - name: tableA_test

You asked if there was any way to override existing macros like the generate macros. Unfortunately, I personally did not implement it that way for my situation because I didn't want to apply it to a larger scale (just a small portion of models).

Then I have my model:

-- my_model.sql

{%- set tbl = get_source(table_name) -%}

select 
    '{{tbl}}' as source,
    column1,
    column2,
    column3
from {{ source( 'my_schema' , tbl ) }} i

and that model calls on this macro to adjust the table name:

-- my_macro.sql

{% macro get_source(table_name) %}

{% if target.name = 'test' %}
{% set output = table_name ~ '_test' %}
{% else %}
{% set output = table_name %}
{% endif %}

{{ return(output) }}

{% endmacro %}

I have not tested this exactly, but I've gotten some version of dynamic schemas to work, so there is potential here. I may have misunderstood the ask too, so let me know if this is completely off.

Upvotes: 1

Dat Nguyen
Dat Nguyen

Reputation: 316

My opinion that you should have separate database(s) for testing, plus having a variable for configuring the database name in dbt_project.yml file

So when to run the tests, you should be able to use --vars option.

Upvotes: 1

Related Questions