Abhijit
Abhijit

Reputation: 17

How we can create a Dimensional model on Delta Lake using Synapses Serverless SQL pool, Without using Azure Analytics Services?

How we can create a Dimensional model on Delta Lake using Synapses Serverless SQL pool, Without using Azure Analytics Services?

Please help me to have the steps...

I found its can be done on Data Lake

Upvotes: 0

Views: 46

Answers (1)

Bhavani
Bhavani

Reputation: 5317

To create a dimensional model on Delta Lake using Synapse Serverless SQL pool, you can follow the procedure below:

Create the required data sources and source file formats, such as Parquet and delimited text formats, using the following code:

CREATE EXTERNAL FILE FORMAT [SynapseParquetFormat] 
WITH (
      FORMAT_TYPE = PARQUET  
    ); 
CREATE EXTERNAL FILE FORMAT [SynapseDelimitedTextFormat] 
    WITH ( FORMAT_TYPE = DELIMITEDTEXT ,
           FORMAT_OPTIONS (
             FIELD_TERMINATOR = ',',
             FIRST_ROW = 2,
             USE_TYPE_DEFAULT = FALSE
            ))

CREATE EXTERNAL DATA SOURCE [files_badls_dfs_core_windows_net] 
    WITH (
        LOCATION = 'abfss://<containerNmae>@<ADLSName>.dfs.core.windows.net' 
    )

Create the source external table using the following code:

CREATE EXTERNAL TABLE dbo.employee (
    [EMPLOYEE_ID] Int,
    [FIRST_NAME] nvarchar(4000),
    [LAST_NAME] nvarchar(4000),
    [EMAIL] nvarchar(4000),
    [PHONE_NUMBER] nvarchar(4000),
    [HIRE_DATE] nvarchar(4000),
    [JOB_ID] nvarchar(4000),
    [SALARY] INT,
    [COMMISSION_PCT] nvarchar(4000),
    [MANAGER_ID] INT,
    [DEPARTMENT_ID] INT
    )
    WITH (
    LOCATION = 'inputs/employees.csv',
    DATA_SOURCE = [files_badls_dfs_core_windows_net],
    FILE_FORMAT = [SynapseDelimitedTextFormat]
    )

For the initial dimension load, extract source data from the CSV file and load it to the Data Lake using the following code:

CREATE EXTERNAL TABLE Dimemployee WITH
(
    LOCATION = 'datawarehouse/conformed/dimemployee/1',
    DATA_SOURCE= files_badls_dfs_core_windows_net,
    FILE_FORMAT = SynapseParquetFormat
   
)   
AS
SELECT ROW_NUMBER() OVER (ORDER BY EMPLOYEE_ID) as Employeekey,
EMPLOYEE_ID as EmployeeBusinesskey,
FIRST_NAME,
GETDATE() as DateTimeLoaded
FROM employee

enter image description here

The dimension data is loaded into a sequence number folder structure as shown below:

enter image description here

For further steps, you can follow this.

Upvotes: 0

Related Questions