Reputation: 13
Self-taught SQL user here. I am trying to find where two values have overlapping dates, display those overlapping dates, and count them. This is a look back at the previous fiscal month. Here is what I am currently working with.
WITH FINAL AS
(
WITH OVER_LAP AS
(
WITH PRODUCT_1 AS
(SELECT ZIP_CODE
,REQUESTED_DATE
,SALE_DATE
,RETURN_DATE
,CANCEL_ORDER_DATE
FROM MAIN_DB
WHERE PRODUCT = 'PRODUCT 1'
AND FISCAL_MONTH = (SELECT DATEADD(MONTH, -1, FISCAL_MONTH)::DATE FISCAL_MONTH
FROM DATE_TABLE
WHERE DATE = CURRENT_DATE())
)
,PRODUCT_2 AS
(SELECT ZIP_CODE
,REQUESTED_DATE
,SALE_DATE
,RETURN_DATE
FROM MAIN_DB
WHERE PRODUCT = 'PRODUCT 2'
AND FISCAL_MONTH = (SELECT DATEADD(MONTH, -1, FISCAL_MONTH)::DATE FISCAL_MONTH
FROM DATE_TABLE
WHERE DATE = CURRENT_DATE())
)
SELECT ZIP_CODE
,(CASE WHEN PRODUCT_1.ZIP_CODE = PRODUCT_2.ZIP_CODE
AND (PRODUCT_2.SALE_DATE >= PRODUCT_1.SALE_DATE
AND (PRODUCT_2.SALE_DATE <= PRODUCT_1.RETURN_DATE OR PRODUCT_2.SALE_DATE <= PRODUCT_1.CANCEL_ORDER_DATE))
THEN *<Display These Days, one day per row>* END) AS PRODUCT_OL
FROM PRODUCT_1 P1
LEFT JOIN PRODUCT_2 P2 ON P1.ZIP_CODE = P2.ZIP_CODE
WHERE P1.SALE_DATE IS NOT NULL
AND P2.SALE_DATE IS NOT NULL
)
SELECT MD.STATE
,MD.CITY
,COUNT(DISTINCT PRODUCT_OL) AS OL_COUNT
FROM MAIN_DB MD
LEFT JOIN FINAL ON MD.ZIP_CODE = FINAL.ZIP_CODE
GROUP BY MD.STATE
,MD.CITY
The bottom 1/3 is where I'm getting hung up obviously. Unfortunately I need it done in a manner similar to this and can't simply do DATEDIFF to get the number of days between PRODUCT_2.SALE_DATE and PRODUCT_1.RETURN_DATE/CANCEL_ORDER_DATE. Any insight or assistance would be greatly appreciated.
Upvotes: 1
Views: 4419
Reputation: 25903
To find the intersection of two date ranges you would:
so avoiding trying to work what all your begining CTEs are doing...
WITH data AS (
SELECT * FROM VALUES
(1,'2020-05-01', '2020-05-20'),
(2,'2020-05-10', '2020-05-25'),
(3,'2020-05-15', '2020-05-30'),
(4,'2020-05-24', '2020-06-14')
v(id, start_date, end_date)
)
select
d1.id,
d1.start_date,
d1.end_date,
d2.id,
d2.start_date,
d2.end_date,
greatest(d1.start_date,d2.start_date) as overlap_start,
least(d1.end_date,d2.end_date) as overlap_end
from data as d1
join data as d2
on d1.id != d2.id
and d1.start_date < d2.end_date and d1.end_date > d2.start_date
order by 1,4;
this gives you all permutations that excluding the same idea. If you want just the combinations replace on d1.id != d2.id
with on d1.id > d2.id
but the gist is D1.start has to be before D2.end AND D1.end has to be after d2.start for things with fractional values like timestamp. But if you are used integer values or date where the end is inclusive, then you want to allow D1.start >= D2.end AND D1.end >= d2.start
example output for original sql:
ID START_DATE END_DATE ID START_DATE END_DATE OVERLAP_START OVERLAP_END
1 2020-05-01 2020-05-20 2 2020-05-10 2020-05-25 2020-05-10 2020-05-20
1 2020-05-01 2020-05-20 3 2020-05-15 2020-05-30 2020-05-15 2020-05-20
2 2020-05-10 2020-05-25 1 2020-05-01 2020-05-20 2020-05-10 2020-05-20
2 2020-05-10 2020-05-25 3 2020-05-15 2020-05-30 2020-05-15 2020-05-25
2 2020-05-10 2020-05-25 4 2020-05-24 2020-06-14 2020-05-24 2020-05-25
3 2020-05-15 2020-05-30 1 2020-05-01 2020-05-20 2020-05-15 2020-05-20
3 2020-05-15 2020-05-30 2 2020-05-10 2020-05-25 2020-05-15 2020-05-25
3 2020-05-15 2020-05-30 4 2020-05-24 2020-06-14 2020-05-24 2020-05-30
4 2020-05-24 2020-06-14 2 2020-05-10 2020-05-25 2020-05-24 2020-05-25
4 2020-05-24 2020-06-14 3 2020-05-15 2020-05-30 2020-05-24 2020-05-30
But that is not what you ask.
You question was more If a have a start/end date, how do I get all the rows between those. To which the answer is you have a date_table
join to that.
WITH rows_of_interest AS (
select
'important other stuff' as other,
'2020-05-01'::date as p2_sale_date,
'2020-05-10'::date as p1_return_or_cancel_date
)
SELECT roi.other,
roi.p2_sale_date,
roi.p1_return_or_cancel_date,
d.date as day_in_range
FROM rows_of_interest AS roi
JOIN date_table d
ON roi.p2_sale_date >= d.date AND roi.p1_return_or_cancel_date <= d.date
and then there is SQL style things I would do differently.
first thing I notice is fiscal_month stuff can be moved to a CTE and joined on for reduce the rows a little like so:
WITH cur_fiscal_month AS (
SELECT DATEADD(MONTH, -1, fiscal_month)::DATE AS fm
FROM date_table
WHERE date = CURRENT_DATE()
), product_1 AS (
SELECT zip_code
,requested_date
,sale_date
,return_date
,cancel_order_date
FROM main_db
JOIN cur_fiscal_month cfm ON cfm.fm = fiscal_month
WHERE product = 'PRODUCT 1'
), product_2 AS (
SELECT zip_code
,requested_date
,sale_date
,return_date
FROM main_db
JOIN cur_fiscal_month cfm ON cfm.fm = fiscal_month
WHERE product = 'PRODUCT 2'
)
then also when you use product_1 and product_2 you have the same WHERE clause on both of them .SALE_DATE IS NOT NULL
thus that should be push into the CTE as both those tables are used just once. Yes if the wind is point in the correct direction Snowflake with do this for you. But it makes the later code cleaner ihmo.
Also in that same block of code, you alias product_1
to p1
, but in the CASE use the table name, strictly speaking once you have an alias you should only use that, and it made the case code simpler to read, almost the reason for aliases.
And I tend to have all SQL tokens is CAPS and all identifiers in lower case, just so there is less yelling.. that one is purely me.
and you don't need to nest your CTE three deep.
So given all that this would be my SQL:
WITH cur_fiscal_month AS (
SELECT DATEADD(MONTH, -1, fiscal_month)::DATE AS fm
FROM date_table
WHERE date = CURRENT_DATE()
), product_1 AS (
SELECT zip_code
,requested_date
,sale_date
,greatest(return_date, cancel_order_date) AS great_end_date
,least(return_date, cancel_order_date) AS least_end_date
FROM main_db
JOIN cur_fiscal_month cfm ON cfm.fm = fiscal_month
WHERE product = 'PRODUCT 1'
AND sale_date IS NOT NULL
), product_2 AS (
SELECT zip_code
,requested_date
,sale_date
--,return_date
FROM main_db
JOIN cur_fiscal_month cfm ON cfm.fm = fiscal_month
WHERE product = 'PRODUCT 2'
AND sale_date IS NOT NULL
), rows_of_interesting_sales AS (
SELECT zip_code
,p2.sale_date as start_date
,great_end_date as end_date
FROM product_1 AS p1
LEFT JOIN product_2 AS p2
ON p1.zip_code = p2.zip_code
AND p2.sale_date >= p1.sale_date
AND p2.sale_date <= great_end_date
), final AS (
SELECT p.zip_code,
d.date as product_ol
FROM rows_of_interesting_sales AS p
JOIN date_table d
ON p.start_date >= d.date AND p.end_date <= d.date
)
SELECT md.state
,md.city
,COUNT(DISTINCT f.product_ol) AS ol_count
FROM main_db AS md
LEFT JOIN final AS f
ON md.zip_code = final.zip_code
GROUP by md.state, md.city;
But there is something uncomfortable to me about your product_1
and product_2
tables, and I suspect to are wanting the combination on all products... but your use case seems strange..
Upvotes: 4