Jonito
Jonito

Reputation: 459

Snowflake not sure about query time comparaison

I would like to calculate the attrition comparing the last month from the selected. In this case i want to know the lost clients from 2020-12-01 to 2021-01-01.

SELECT  DISTINCT("Client Ref"),"Zone","Market Place","Report Period"
FROM    "REPORT_DB"."PBI"."Revenue" d
WHERE   "Report Period" ='2020-12-01' AND "Market Place" ='UK' AND "Client Ref" IS NOT NULL 
AND
NOT EXISTS
        (
        SELECT  "Client Ref"
        FROM    "REPORT_DB"."PBI"."Revenue" t
        WHERE   "Report Period" ='2021-01-01' AND "Market Place" ='UK' AND "Client Ref" IS NOT NULL AND d."Client Ref"=t."Client Ref"
        )

Is it the correct way to retrieve it ?

Regards.

Upvotes: 2

Views: 62

Answers (1)

Simeon Pilgrim
Simeon Pilgrim

Reputation: 25903

So by adding a CTE with some dummy data, and changing the column names to all be safe

WITH data AS (
    SELECT * FROM VALUES 
    (1,'a','UK','2020-12-01'),
    (1,'a','UK','2021-01-01'),
    (2,'a','UK','2020-12-01'),
    (3,'a','UK','2021-01-01')
    v( Client_Ref, zone, Market_Place, Report_Period)
)
SELECT DISTINCT d.Client_Ref,d.zone,d.Market_Place,d.Report_Period
FROM data AS d
WHERE d.Report_Period ='2020-12-01' AND d.Market_Place ='UK' AND d.Client_Ref IS NOT NULL 
AND
NOT EXISTS
        (
        SELECT  t.Client_Ref
        FROM    data t
        WHERE   t.Report_Period ='2021-01-01' AND t.Market_Place ='UK' AND t.Client_Ref IS NOT NULL AND d.Client_Ref=t.Client_Ref
        );

you base form for SQL works and returns:

CLIENT_REF  ZONE    MARKET_PLACE    REPORT_PERIOD
2           a       UK              2020-12-01

which is the expected results.

This Query is a correlated sub-query, which Snowflake has limited support for. So while this works, as you change the query it might run into the Unsupported subquery type cannot be evaluated error see SO correlated sub-query question.

Thw base query can be written in a un-correlated form, by using a LEFT JOIN and then WHERE x IS NULL pattern:

WITH data AS (
    SELECT * FROM VALUES 
    (1,'a','UK','2020-12-01'),
    (1,'a','UK','2021-01-01'),
    (2,'a','UK','2020-12-01'),
    (3,'a','UK','2021-01-01')
    v( Client_Ref, zone, Market_Place, Report_Period)
)
SELECT DISTINCT d.Client_Ref,d.zone,d.Market_Place,d.Report_Period
FROM data AS d
LEFT JOIN data AS t
    ON t.Report_Period ='2021-01-01' AND t.Market_Place ='UK' AND d.Client_Ref=t.Client_Ref
WHERE d.Report_Period ='2020-12-01' AND d.Market_Place ='UK' AND d.Client_Ref IS NOT NULL 
AND t.Client_Ref IS NULL;

which can be rewritten to do some filtering first if your data source has many rows not in the target result range, like so:

WITH data AS (
    SELECT * FROM VALUES 
    (1,'a','UK','2020-12-01'),
    (1,'a','UK','2021-01-01'),
    (2,'a','UK','2020-12-01'),
    (3,'a','UK','2021-01-01')
    v( Client_Ref, zone, Market_Place, Report_Period)
), wanted_data AS (
    SELECT DISTINCT Client_Ref, zone, Market_Place, Report_Period
    FROM data
    WHERE Report_Period BETWEEN '2020-12-01' AND '2021-01-01'
    AND Market_Place ='UK' AND Client_Ref IS NOT NULL
)
SELECT DISTINCT d.Client_Ref,d.zone,d.Market_Place,d.Report_Period
FROM wanted_data AS d
LEFT JOIN wanted_data AS t
    ON t.Report_Period ='2021-01-01'AND d.Client_Ref=t.Client_Ref
WHERE d.Report_Period ='2020-12-01' 
AND t.Client_Ref IS NULL;

But for the life of me, my SQL does not work if I name the columns "Client Ref" like you have, so that I cannot answer that part, but this is how you can structure your SQL.

Upvotes: 1

Related Questions