Matt Elgazar
Matt Elgazar

Reputation: 725

Jinja DBT for loop union with some different columns across tables

I'm creating a master events table, where almost all of the same columns are in every table, but a few tables are missing one or two columns. In those cases I'd like to replace those columns with null whenever the column doesn't exist in the table. When I run the code below, every cell in the output table is NULL.

Assume columns 1 and 2 are in every table, but column 3 is in table 1 and 2 but not 3.

{{ config(schema='MYSCHEMA', materialized='table') }}

{% set tables = ['table1', 'table2', 'table3'] %}

{% set possible_columns = ['col1', 'col2', 'col3'] %}

{% for table in tables %}
  {%- set table_columns = adapter.get_columns_in_relation( ref(table) ) -%}
  select
    {% for pc in possible_columns %}

      {% if not loop.last -%}

        {% if pc in table_columns %}
          {{ pc }},
        {% else %}
          null as {{ pc }},
        {%- endif %}

      {% else %}
        {% if pc in table_columns %}
          {{ pc }}
        {% else %}
          null as {{ pc }}
        {%- endif %}

      {% endif %}

    {%- endfor %}

  from
    {{ ref(table) }}

  {% if not loop.last -%}
    union all
  {%- endif %}
{% endfor %}

Upvotes: 1

Views: 2629

Answers (1)

tconbeer
tconbeer

Reputation: 5805

I'd recommend using dbt_utils.union_relations for this. It does exactly what you describe -- it creates a superset of columns from all the tables and fills in nulls where appropriate.

{{ dbt_utils.union_relations(
    relations=[
        ref('table1'),
        ref('table2'),
        ref('table3'),
    ],
    include = ['col1', 'col2', 'col3']
) }}

BUT, if you want to roll your own...

The problem with your code is this line:

{% if pc in table_columns %}

adapter.get_columns_in_relation returns a list of Column objects, not a list of strings. To compare pc to the name of a column, you could use:

{% set cols = [] %}
{% for col in table_columns %}
    {% do cols.append(col.name) %}
{% endfor %}
...
{% if pc in cols %}

You can also remove a bunch of redundant code by pushing down the if not loop.last block to just the comma, so this all becomes:

{% for table in tables %}
    {%- set table_columns = adapter.get_columns_in_relation( ref(table) ) -%}
    {% set cols = [] %}
    {% for col in table_columns %}
        {% do cols.append(col.name) %}
    {% endfor %}
    select
    {% for pc in possible_columns %}
        {% if pc in cols %}
            {{ pc -}}
        {% else %}
            null as {{ pc -}}
        {%- endif %}{%- if not loop.last -%},{% endif %}
    {%- endfor %}
    ...

Upvotes: 1

Related Questions