Reputation: 71
We have a Azure Synapse Pipeline that brings in data as CSV Files from our On Premise SQL Server to Data Lakes using a Copy Activity. From Data Lakes, it gets imported into our Azure SQL Pool using another Copy Activity. It's about 100 tables (200gigs of data).
The problem is that the On Premise table structures are subject to change at anytime. Therefore, what we want to do is recreate the table structures in the SQL Pool each time we bring in the data.
I know there is a CTAS (Create Table AS) technique to create the table from Data Lakes but the problem is that because it's CSV files, it doesn't have any type information. We don't want to use Parquet since it's doesn't have all the types (we want to keep the same types as On Premise tables).
Is there are a solution for this? I'm thinking about querying the Information schema to get the DDL for the tables but is there a better way?
Upvotes: 0
Views: 692
Reputation: 3145
As per my knowledge the solution is to use the POLYBASE feature in Azure SQL Pool to create external tables that points to the CSV files in the Data Lake.
In this way you can define the table structure in SQL Pool.and map the columns to the CSV file columns. Then you can use CTAS to create a new internal table with the same structure as the external table.
To create the external table, you can use the CREATE EXTERNAL TABLE statement, specifying the column names, data types, and file format options.
Syntax to create External table based on Microsoft documentaition.
CREATE EXTERNAL DATA SOURCE <data_source_name>
WITH
( [ LOCATION = '<prefix>://<path>[:<port>]' ]
[ [ , ] CREDENTIAL = <credential_name> ]
[ [ , ] TYPE = HADOOP ]
)
[ ; ]
Example to create external table. refer this example this
CREATE EXTERNAL TABLE csv_table
(
column1 INT,
column2 VARCHAR(50),
column3 DATE
)
WITH
(
LOCATION='/datalake/csv_files',
DATA_SOURCE = AzureDataLakeStorage,
FILE_FORMAT = CSV_FILE_FORMAT,
)
Source:
Sink:
As we are using the Polybase feature. you will need to enable the staging as well.
Once you have created the external table, you can use the CTAS statement to create a new internal table with the same structure as the external table.
Upvotes: 0