Reputation: 725
I'm creating a master events table, where almost all of the same columns are in every table, but a few tables are missing one or two columns. In those cases I'd like to replace those columns with null
whenever the column doesn't exist in the table. When I run the code below, every cell in the output table is NULL.
Assume columns 1 and 2 are in every table, but column 3 is in table 1 and 2 but not 3.
{{ config(schema='MYSCHEMA', materialized='table') }}
{% set tables = ['table1', 'table2', 'table3'] %}
{% set possible_columns = ['col1', 'col2', 'col3'] %}
{% for table in tables %}
{%- set table_columns = adapter.get_columns_in_relation( ref(table) ) -%}
select
{% for pc in possible_columns %}
{% if not loop.last -%}
{% if pc in table_columns %}
{{ pc }},
{% else %}
null as {{ pc }},
{%- endif %}
{% else %}
{% if pc in table_columns %}
{{ pc }}
{% else %}
null as {{ pc }}
{%- endif %}
{% endif %}
{%- endfor %}
from
{{ ref(table) }}
{% if not loop.last -%}
union all
{%- endif %}
{% endfor %}
Upvotes: 1
Views: 2629
Reputation: 5805
I'd recommend using dbt_utils.union_relations
for this. It does exactly what you describe -- it creates a superset of columns from all the tables and fills in nulls where appropriate.
{{ dbt_utils.union_relations(
relations=[
ref('table1'),
ref('table2'),
ref('table3'),
],
include = ['col1', 'col2', 'col3']
) }}
BUT, if you want to roll your own...
The problem with your code is this line:
{% if pc in table_columns %}
adapter.get_columns_in_relation
returns a list of Column objects, not a list of strings. To compare pc
to the name of a column, you could use:
{% set cols = [] %}
{% for col in table_columns %}
{% do cols.append(col.name) %}
{% endfor %}
...
{% if pc in cols %}
You can also remove a bunch of redundant code by pushing down the if not loop.last
block to just the comma, so this all becomes:
{% for table in tables %}
{%- set table_columns = adapter.get_columns_in_relation( ref(table) ) -%}
{% set cols = [] %}
{% for col in table_columns %}
{% do cols.append(col.name) %}
{% endfor %}
select
{% for pc in possible_columns %}
{% if pc in cols %}
{{ pc -}}
{% else %}
null as {{ pc -}}
{%- endif %}{%- if not loop.last -%},{% endif %}
{%- endfor %}
...
Upvotes: 1