Reputation: 1
I want to create an external table with partition columns based on year, month and day. I want to be able to query through this data in an optimized way. The queries could involve using a range of days, months or years. Therefore, I want to be able to use 'BETWEEN' in my query. This would help me avoid scanning the entire datalake and reduce the cost and time.
I used similar structure in AWS using Athena; however, I am having lots of trouble creating a table with partitions in Azure Synapse Analytics. I would really appreciate it if you could guide me with this problem.
Sample Query could be as follows:
SELECT * FROM my_partitioned_table
WHERE year BETWEEN 2023 AND 2024
AND month BETWEEN 1 AND 2
AND day BETWEEN 5 AND 6
Query to create an external table:
CREATE EXTERNAL TABLE [dbo].[my_partitioned_table]
(
[UTCDate] DATE,
[UTCHour] INT,
[LocalDate] DATE,
[LocalHour] INT,
[value] FLOAT
)
WITH (
LOCATION = '/path/year=*/month=*/day=*/*.parquet',
DATA_SOURCE = [datasource],
FILE_FORMAT = [fileformat]
)
PARTITION BY ( [year] INT, [month] INT, [day] INT );
The above query is wrong and I do not understand how to add partitions. I also made sure my files are stored in the format /path/year=/month=/day=/.parquet.
Upvotes: 0
Views: 619
Reputation: 4967
You can't create partitioned external tables in Synapse Serverless, you have to create the partitioned table in Spark. You can however in the view create a partitioned view.
Note: Your query to create the table if you do it in Dedicated SQL, the 'partition by' should reference the column, you have not defined [year], [month] or [day] etc. It should get the year from a column value, for example YEAR([UTCDate]) AS Year as a column.
Upvotes: 0