Niks A
Niks A

Reputation: 101

will DBT support temp table creation like create table #temp1 as select * from tab1 or it works only CTE way

I found out a way to handle the temp tables in DBT, write all those in pre-hook and call the final temp table in the outside of the pre-hook, tested and is working fine, able to reduce the code running time from more than 20 mins to 1 min. But I see one problem that we can't see the lineage graph in the DBT documents. Is there any way to handle the temp tables other than pre-hook and with lineage in Docs?

Upvotes: 8

Views: 13744

Answers (3)

Claire Carroll
Claire Carroll

Reputation: 759

You're right in thinking that dbt does not support temporary tables. That's because temporary tables only persist in a single session, and dbt opens one connection/session per thread. Therefore any temporary tables created on one thread would not be visible to a model running on a different thread.

It sounds like CTEs are a performance drag for you though — out of interest, which warehouse are you using?

You've identified two workarounds, and there's another one worth discussing:

Option 1: Materialize your model as CTEs using the ephemeral materialization (docs)

Pros:

  • The models show up in the lineage graph
  • You can re-use these transformations in multiple downstream models by ref-ing them
  • You can test and document these models

Cons:

  • At some point there is a performance degradation with too many stacked CTEs (especially on older versions of postgres, where CTEs are an optimization fence)
  • Compiled SQL can be harder to debug

Option 2: Use pre-hooks to create temp tables

I would generally recommend against this — you can't test or document your models, and they won't be in the lineage graph (as you've noted).

Option 3: Materialize these models as tables in a separate schema, and drop the schema at the end of a run

I think Michael's suggestion is a good one! I'd tweak it just a little bit:

  1. Use the schema config to materialize a model in a separate schema
{{ config(
  materialized='table',
  schema='my_temporary_schema'
) }}
  1. Then, at the end of a run, use an on-run-end hook (docs) to drop that schema — in your dbt_project.yml:
on-run-end: "drop schema my_temporary_schema cascade"

Pros:

  • All the benefits of Option 1
  • Sounds like it might be more performant than using CTEs

Cons:

  • Make sure you don't have any dependent views on top of that schema! They might get dropped when you run a drop cascade command! This introduces fragility into your project!

Upvotes: 14

luther
luther

Reputation: 264

Two solutions:

  1. creates a source (SRC) model so that you can separate the source from the staging models.
  2. Build the source using a staging model directly.

These should both show up on the lineage graphs.

Upvotes: 0

Michael Irvine
Michael Irvine

Reputation: 1

I think there are two ways you could get the same result while preserving lineage in the documentation:

  1. Write each temp table as a separate model where the logic happens in the pre_hook (like you suggested) and the model is just a view with the logic select * from YOUR_TEMP_TABLE_NAME.
  2. Instead of using temp tables, create each table as a regular model, and just drop them in the post_hook of the "final" model that uses them or in the on-run-end of your dbt_project.yml.

Upvotes: 0

Related Questions