Reputation: 2076
I have a bunch of dbt models that share about 90% of their structure. The idea is that these models will be combined into a single unified downstream model during the dbt run. Currently my tests for the models have a lot of duplication. For example
- name: model1
columns:
- name: colA
tests:
- accepted_values:
- values ['a','b']
- name: colB
tests:
- non_null
- name: model2
columns:
- name: colA
tests:
- accepted_values:
- values ['a','b','c']
- name: colB
tests:
- non_null
I'd like to reduce the duplication in schema.yml file by re-using the test config with small variations.
What I have tried so far
defining the tests as a var in dbt_project.yml
and referencing it in the schema.yml
. This works but you cannot have any variation
defining a macro that returns a python list that has the test config and calling the macro like this
columns: "{{ common_tests() }}"
This doesn't work as I get could not render {{ common_tests() }} 'common_tests' is undefined
.
Interestingly it is possible to render yaml with a macro within individual tests within the yaml file, just not at the top level.
I feel there should be an easy(ish) solution here, I'm just not finding it. Thanks in advance.
Upvotes: 2
Views: 1364
Reputation: 5815
If you don’t mind defining all these models in a single .yml
file, you can use YAML anchors for this.
Josh Devlin has a nice write-up here:
version: 2
models:
- name: model_one
columns:
- name: id
tests: &unique_not_null
- unique
- not_null
- name: col_a
- name: col_b
- name: model_two
columns:
- name: id
tests: *unique_not_null
- name: col_c
- name: col_d
Josh’s example shows an anchor on the tests
key for a single column, but you could also use an anchor on the columns
key. That doesn’t work so well though, because even with the merge operator (<<
), you would need to repeat everything if there is a single change in a single test. There is no YAML equivalent for repeating lists or list items, which is really what you need here.
Upvotes: 6