xmlapi
xmlapi

Reputation: 71

Azure Synapse Pipeline - Create DDL on the fly

We have a Azure Synapse Pipeline that brings in data as CSV Files from our On Premise SQL Server to Data Lakes using a Copy Activity. From Data Lakes, it gets imported into our Azure SQL Pool using another Copy Activity. It's about 100 tables (200gigs of data).

The problem is that the On Premise table structures are subject to change at anytime. Therefore, what we want to do is recreate the table structures in the SQL Pool each time we bring in the data.

I know there is a CTAS (Create Table AS) technique to create the table from Data Lakes but the problem is that because it's CSV files, it doesn't have any type information. We don't want to use Parquet since it's doesn't have all the types (we want to keep the same types as On Premise tables).

Is there are a solution for this? I'm thinking about querying the Information schema to get the DDL for the tables but is there a better way?

Upvotes: 0

Views: 692

Answers (1)

As per my knowledge the solution is to use the POLYBASE feature in Azure SQL Pool to create external tables that points to the CSV files in the Data Lake.

In this way you can define the table structure in SQL Pool.and map the columns to the CSV file columns. Then you can use CTAS to create a new internal table with the same structure as the external table.

To create the external table, you can use the CREATE EXTERNAL TABLE statement, specifying the column names, data types, and file format options.

Syntax to create External table based on Microsoft documentaition.

CREATE EXTERNAL DATA SOURCE <data_source_name>
WITH
( [ LOCATION = '<prefix>://<path>[:<port>]' ]
  [ [ , ] CREDENTIAL = <credential_name> ]
  [ [ , ] TYPE = HADOOP ]
)
[ ; ]

Example to create external table. refer this example this

 CREATE EXTERNAL TABLE csv_table
    (
       column1 INT,
       column2 VARCHAR(50),
       column3 DATE
    )
    WITH
    (
        LOCATION='/datalake/csv_files',
        DATA_SOURCE = AzureDataLakeStorage,
        FILE_FORMAT = CSV_FILE_FORMAT,
       )
    

Source: enter image description here

Sink: enter image description here

As we are using the Polybase feature. you will need to enable the staging as well. enter image description here

enter image description here

enter image description here

Once you have created the external table, you can use the CTAS statement to create a new internal table with the same structure as the external table.

Upvotes: 0

Related Questions