highDopamine
highDopamine

Reputation: 97

How to use CETAS (Synapse Serverless Pool) in dbt?

In Synapse Serverless Pool, I can use CETAS to create external table and export the results to the Azure Data Lake Storage.

CREATE EXTERNAL TABLE external_table
WITH (
    LOCATION = 'location/',
    DATA_SOURCE = staging_zone,  
    FILE_FORMAT = SynapseParquetFormat
)  
AS
SELECT * FROM table

It will create an external table name external_table in Synapse and write a parquet file to my staging zone in Azure Data Lake.

How can I do this in dbt?

Upvotes: 1

Views: 1372

Answers (2)

highDopamine
highDopamine

Reputation: 97

I wrote a materialization for CETAS (Synapse Serverless Pool) here: https://github.com/intheroom/dbt-synapse-serverless

It's a forked from dbt-synapse-serverless here: https://github.com/dbt-msft/dbt-synapse-serverless

Also you can use hooks in dbt to use CETAS.

Upvotes: 0

Jay Swanson
Jay Swanson

Reputation: 41

I was trying to do something very similar and run my dbt project with Synapse Serverless Pool, but ran into several issues. Ultimately I was mislead by CETAS. When you create the external table it creates a folder hierarchy, in which it places the parquet file. If you were to run the same script like the one you have as an example it fails because you cannot overwrite with CETAS. So dbt would be able to run it like any other model, but it wouldn't be easy to overwrite. Maybe if you dynamically made a new parquet every time the script is run and deleted the old one, but that seems like putting a small bandage on the hemorrhaging wound that is the synapse and severless pool interaction. I had to switch up my architecture for this reason.

I was trying to export as a parquet to maintain the column datatypes and descriptions so I didn't have to re-schematize. Also so I could create tables based of incremental points in my pipeline. I ended up finding a way to pull from a database that already had the datatype schemas, using the dbt-synapse adapter. Then if I needed an incremental table, I could materialize it as a table via dbt and dbt-synapse and access it that way.

What is your goal with the exported parquet file?

Maybe we can find another solution?

Here's the dbt-synapse-serverless adapter github where it lists caveats for serverless pools.

Upvotes: 1

Related Questions