Create Partitions in External Tables in Azure Synapse SQL Database (Serverless)

Question

I want to create an external table with partition columns based on year, month and day. I want to be able to query through this data in an optimized way. The queries could involve using a range of days, months or years. Therefore, I want to be able to use 'BETWEEN' in my query. This would help me avoid scanning the entire datalake and reduce the cost and time.

I used similar structure in AWS using Athena; however, I am having lots of trouble creating a table with partitions in Azure Synapse Analytics. I would really appreciate it if you could guide me with this problem.

Sample Query could be as follows:

SELECT * FROM my_partitioned_table
WHERE year BETWEEN 2023 AND 2024
AND month BETWEEN 1 AND 2
AND day BETWEEN 5 AND 6

Query to create an external table:

CREATE EXTERNAL TABLE [dbo].[my_partitioned_table]
(
[UTCDate] DATE,
[UTCHour] INT,
[LocalDate] DATE,
[LocalHour] INT,
[value] FLOAT
)
WITH (
LOCATION = '/path/year=*/month=*/day=*/*.parquet',
DATA_SOURCE = [datasource],
FILE_FORMAT = [fileformat]
)
PARTITION BY ( [year] INT, [month] INT, [day] INT );

The above query is wrong and I do not understand how to add partitions. I also made sure my files are stored in the format /path/year=/month=/day=/.parquet.

Create Partitions in External Tables in Azure Synapse SQL Database (Serverless)

Answers (1)

Related Questions