Reputation: 459
I would like to calculate the attrition comparing the last month from the selected. In this case i want to know the lost clients from 2020-12-01 to 2021-01-01.
SELECT DISTINCT("Client Ref"),"Zone","Market Place","Report Period"
FROM "REPORT_DB"."PBI"."Revenue" d
WHERE "Report Period" ='2020-12-01' AND "Market Place" ='UK' AND "Client Ref" IS NOT NULL
AND
NOT EXISTS
(
SELECT "Client Ref"
FROM "REPORT_DB"."PBI"."Revenue" t
WHERE "Report Period" ='2021-01-01' AND "Market Place" ='UK' AND "Client Ref" IS NOT NULL AND d."Client Ref"=t."Client Ref"
)
Is it the correct way to retrieve it ?
Regards.
Upvotes: 2
Views: 62
Reputation: 25903
So by adding a CTE with some dummy data, and changing the column names to all be safe
WITH data AS (
SELECT * FROM VALUES
(1,'a','UK','2020-12-01'),
(1,'a','UK','2021-01-01'),
(2,'a','UK','2020-12-01'),
(3,'a','UK','2021-01-01')
v( Client_Ref, zone, Market_Place, Report_Period)
)
SELECT DISTINCT d.Client_Ref,d.zone,d.Market_Place,d.Report_Period
FROM data AS d
WHERE d.Report_Period ='2020-12-01' AND d.Market_Place ='UK' AND d.Client_Ref IS NOT NULL
AND
NOT EXISTS
(
SELECT t.Client_Ref
FROM data t
WHERE t.Report_Period ='2021-01-01' AND t.Market_Place ='UK' AND t.Client_Ref IS NOT NULL AND d.Client_Ref=t.Client_Ref
);
you base form for SQL works and returns:
CLIENT_REF ZONE MARKET_PLACE REPORT_PERIOD
2 a UK 2020-12-01
which is the expected results.
This Query is a correlated sub-query, which Snowflake has limited support for. So while this works, as you change the query it might run into the Unsupported subquery type cannot be evaluated
error see SO correlated sub-query question.
Thw base query can be written in a un-correlated form, by using a LEFT JOIN
and then WHERE x IS NULL
pattern:
WITH data AS (
SELECT * FROM VALUES
(1,'a','UK','2020-12-01'),
(1,'a','UK','2021-01-01'),
(2,'a','UK','2020-12-01'),
(3,'a','UK','2021-01-01')
v( Client_Ref, zone, Market_Place, Report_Period)
)
SELECT DISTINCT d.Client_Ref,d.zone,d.Market_Place,d.Report_Period
FROM data AS d
LEFT JOIN data AS t
ON t.Report_Period ='2021-01-01' AND t.Market_Place ='UK' AND d.Client_Ref=t.Client_Ref
WHERE d.Report_Period ='2020-12-01' AND d.Market_Place ='UK' AND d.Client_Ref IS NOT NULL
AND t.Client_Ref IS NULL;
which can be rewritten to do some filtering first if your data source has many rows not in the target result range, like so:
WITH data AS (
SELECT * FROM VALUES
(1,'a','UK','2020-12-01'),
(1,'a','UK','2021-01-01'),
(2,'a','UK','2020-12-01'),
(3,'a','UK','2021-01-01')
v( Client_Ref, zone, Market_Place, Report_Period)
), wanted_data AS (
SELECT DISTINCT Client_Ref, zone, Market_Place, Report_Period
FROM data
WHERE Report_Period BETWEEN '2020-12-01' AND '2021-01-01'
AND Market_Place ='UK' AND Client_Ref IS NOT NULL
)
SELECT DISTINCT d.Client_Ref,d.zone,d.Market_Place,d.Report_Period
FROM wanted_data AS d
LEFT JOIN wanted_data AS t
ON t.Report_Period ='2021-01-01'AND d.Client_Ref=t.Client_Ref
WHERE d.Report_Period ='2020-12-01'
AND t.Client_Ref IS NULL;
But for the life of me, my SQL does not work if I name the columns "Client Ref"
like you have, so that I cannot answer that part, but this is how you can structure your SQL.
Upvotes: 1