Timo Strotmann
Timo Strotmann

Reputation: 391

Is it possible to filter on a dynamic date range with an Azure Synapse Serverless SQL Pool?

I have a large dataset with ~40.000.000.000 records, which is partitioned by date.

To ensure that a single user does not query the entire dataset at once, I would like to provide a SQL function instead of direct access to the dataset. The SQL function has an @from and an @to data as input parameters and checks that the function call will not query more than the data of 14 days at once.

So far so good, but what doesn't work is to use a date range (list of dates) as filter criteria in my OPENROWSET query.

I have already tried to provide the date list as subquery, #tempTable and @tableVar, but nothing seems to work.

Below you can see my last attempt with a subquery.

DECLARE @From DATE = '2020-10-17', @To DATE = '2020-10-25';


SELECT  *
FROM OPENROWSET(BULK '/date=*/*', DATA_SOURCE = 'data', FORMAT='PARQUET') as rows
WHERE  rows.filepath(1) In ( 
    (
    SELECT TOP (DATEDIFF(DAY, @From, @To) + 1) Date = DATEADD(DAY, ROW_NUMBER() OVER(ORDER BY a.object_id) - 1, @From)
    FROM    sys.all_objects a
    CROSS JOIN sys.all_objects b 

) -- Date list based on the @From and @To 

When I run this code I get the following error message, just like when I use a #tempTable

The query references an object that is not supported in distributed processing mode.

I am grateful for every tip.

Upvotes: 0

Views: 1498

Answers (1)

Timo Strotmann
Timo Strotmann

Reputation: 391

Oh man, thanks for the tip kashyap.

Somehow I did not try the most obvious solution before:

DECLARE @From DATE = '2020-10-17', @To DATE = '2020-10-25';

SELECT  *
FROM OPENROWSET(BULK '/date=*/*', DATA_SOURCE = 'data', FORMAT='PARQUET') as rows
WHERE  rows.filepath(1) >= @From and rows.filepath(1) <= @To

This solution provides the correct result and also utilizes the partitioning to ensure that as few data as possible has to be retrieved from the data lake.

Upvotes: 2

Related Questions