Stahne
Stahne

Reputation: 13

Snowflake - Get all the dates that overlap between two values so that they can be both displayed and counted

Self-taught SQL user here. I am trying to find where two values have overlapping dates, display those overlapping dates, and count them. This is a look back at the previous fiscal month. Here is what I am currently working with.

WITH FINAL AS
(
WITH OVER_LAP AS
(
WITH PRODUCT_1 AS
(SELECT ZIP_CODE
,REQUESTED_DATE
,SALE_DATE
,RETURN_DATE
,CANCEL_ORDER_DATE
FROM MAIN_DB
WHERE PRODUCT = 'PRODUCT 1'
AND FISCAL_MONTH = (SELECT DATEADD(MONTH, -1, FISCAL_MONTH)::DATE FISCAL_MONTH
                     FROM DATE_TABLE 
                     WHERE DATE = CURRENT_DATE())
)
,PRODUCT_2 AS
(SELECT ZIP_CODE
,REQUESTED_DATE
,SALE_DATE
,RETURN_DATE
FROM MAIN_DB
WHERE PRODUCT = 'PRODUCT 2'
AND FISCAL_MONTH = (SELECT DATEADD(MONTH, -1, FISCAL_MONTH)::DATE FISCAL_MONTH
                     FROM DATE_TABLE 
                     WHERE DATE = CURRENT_DATE())
)
SELECT ZIP_CODE
,(CASE WHEN PRODUCT_1.ZIP_CODE = PRODUCT_2.ZIP_CODE
         AND (PRODUCT_2.SALE_DATE >= PRODUCT_1.SALE_DATE
           AND (PRODUCT_2.SALE_DATE <= PRODUCT_1.RETURN_DATE OR PRODUCT_2.SALE_DATE <= PRODUCT_1.CANCEL_ORDER_DATE))
            THEN *<Display These Days, one day per row>* END) AS PRODUCT_OL
FROM PRODUCT_1 P1
LEFT JOIN PRODUCT_2 P2 ON P1.ZIP_CODE = P2.ZIP_CODE
WHERE P1.SALE_DATE IS NOT NULL
AND P2.SALE_DATE IS NOT NULL
)
SELECT MD.STATE
,MD.CITY
,COUNT(DISTINCT PRODUCT_OL) AS OL_COUNT
FROM MAIN_DB MD
LEFT JOIN FINAL ON MD.ZIP_CODE = FINAL.ZIP_CODE
GROUP BY MD.STATE
,MD.CITY

The bottom 1/3 is where I'm getting hung up obviously. Unfortunately I need it done in a manner similar to this and can't simply do DATEDIFF to get the number of days between PRODUCT_2.SALE_DATE and PRODUCT_1.RETURN_DATE/CANCEL_ORDER_DATE. Any insight or assistance would be greatly appreciated.

Upvotes: 1

Views: 4419

Answers (1)

Simeon Pilgrim
Simeon Pilgrim

Reputation: 25903

To find the intersection of two date ranges you would:

so avoiding trying to work what all your begining CTEs are doing...

WITH data AS (        
    SELECT * FROM VALUES 
        (1,'2020-05-01', '2020-05-20'),
        (2,'2020-05-10', '2020-05-25'),
        (3,'2020-05-15', '2020-05-30'), 
        (4,'2020-05-24', '2020-06-14') 
        v(id, start_date, end_date)
)
select 
    d1.id, 
    d1.start_date, 
    d1.end_date, 
    d2.id, 
    d2.start_date, 
    d2.end_date, 
    greatest(d1.start_date,d2.start_date) as overlap_start,
    least(d1.end_date,d2.end_date) as overlap_end
from data as d1
join data as d2 
    on d1.id != d2.id 
    and d1.start_date < d2.end_date and d1.end_date > d2.start_date
order by 1,4;

this gives you all permutations that excluding the same idea. If you want just the combinations replace on d1.id != d2.id with on d1.id > d2.id

but the gist is D1.start has to be before D2.end AND D1.end has to be after d2.start for things with fractional values like timestamp. But if you are used integer values or date where the end is inclusive, then you want to allow D1.start >= D2.end AND D1.end >= d2.start

example output for original sql:

ID  START_DATE  END_DATE    ID  START_DATE  END_DATE    OVERLAP_START   OVERLAP_END
1   2020-05-01  2020-05-20  2   2020-05-10  2020-05-25  2020-05-10  2020-05-20
1   2020-05-01  2020-05-20  3   2020-05-15  2020-05-30  2020-05-15  2020-05-20
2   2020-05-10  2020-05-25  1   2020-05-01  2020-05-20  2020-05-10  2020-05-20
2   2020-05-10  2020-05-25  3   2020-05-15  2020-05-30  2020-05-15  2020-05-25
2   2020-05-10  2020-05-25  4   2020-05-24  2020-06-14  2020-05-24  2020-05-25
3   2020-05-15  2020-05-30  1   2020-05-01  2020-05-20  2020-05-15  2020-05-20
3   2020-05-15  2020-05-30  2   2020-05-10  2020-05-25  2020-05-15  2020-05-25
3   2020-05-15  2020-05-30  4   2020-05-24  2020-06-14  2020-05-24  2020-05-30
4   2020-05-24  2020-06-14  2   2020-05-10  2020-05-25  2020-05-24  2020-05-25
4   2020-05-24  2020-06-14  3   2020-05-15  2020-05-30  2020-05-24  2020-05-30

But that is not what you ask.

You question was more If a have a start/end date, how do I get all the rows between those. To which the answer is you have a date_table join to that.

WITH rows_of_interest AS (
    select
       'important other stuff' as other,
       '2020-05-01'::date as p2_sale_date,
       '2020-05-10'::date as p1_return_or_cancel_date
 )
 SELECT roi.other, 
     roi.p2_sale_date,
     roi.p1_return_or_cancel_date,
     d.date as day_in_range   
 FROM rows_of_interest AS roi
 JOIN date_table d 
     ON roi.p2_sale_date >= d.date AND roi.p1_return_or_cancel_date <= d.date

and then there is SQL style things I would do differently.

first thing I notice is fiscal_month stuff can be moved to a CTE and joined on for reduce the rows a little like so:

    WITH cur_fiscal_month AS (
        SELECT DATEADD(MONTH, -1, fiscal_month)::DATE AS fm
                 FROM date_table 
                 WHERE date = CURRENT_DATE()
    ), product_1 AS (
        SELECT zip_code
            ,requested_date
            ,sale_date
            ,return_date
            ,cancel_order_date
        FROM main_db
        JOIN cur_fiscal_month cfm ON cfm.fm = fiscal_month
        WHERE product = 'PRODUCT 1'
    ), product_2 AS (
        SELECT zip_code
            ,requested_date
            ,sale_date
            ,return_date
        FROM main_db
        JOIN cur_fiscal_month cfm ON cfm.fm = fiscal_month
        WHERE product = 'PRODUCT 2'
    )

then also when you use product_1 and product_2 you have the same WHERE clause on both of them .SALE_DATE IS NOT NULL thus that should be push into the CTE as both those tables are used just once. Yes if the wind is point in the correct direction Snowflake with do this for you. But it makes the later code cleaner ihmo.

Also in that same block of code, you alias product_1 to p1, but in the CASE use the table name, strictly speaking once you have an alias you should only use that, and it made the case code simpler to read, almost the reason for aliases.

And I tend to have all SQL tokens is CAPS and all identifiers in lower case, just so there is less yelling.. that one is purely me.

and you don't need to nest your CTE three deep.

So given all that this would be my SQL:

WITH cur_fiscal_month AS (
    SELECT DATEADD(MONTH, -1, fiscal_month)::DATE AS fm
             FROM date_table 
             WHERE date = CURRENT_DATE()
), product_1 AS (
    SELECT zip_code
        ,requested_date
        ,sale_date
        ,greatest(return_date, cancel_order_date) AS great_end_date
        ,least(return_date, cancel_order_date) AS least_end_date
    FROM main_db
    JOIN cur_fiscal_month cfm ON cfm.fm = fiscal_month
    WHERE product = 'PRODUCT 1'
        AND sale_date IS NOT NULL
), product_2 AS (
    SELECT zip_code
        ,requested_date
        ,sale_date
        --,return_date
    FROM main_db
    JOIN cur_fiscal_month cfm ON cfm.fm = fiscal_month
    WHERE product = 'PRODUCT 2'
        AND sale_date IS NOT NULL
), rows_of_interesting_sales AS (        
    SELECT zip_code
        ,p2.sale_date as start_date
        ,great_end_date as end_date
    FROM product_1 AS p1
    LEFT JOIN product_2 AS p2 
        ON p1.zip_code = p2.zip_code 
        AND p2.sale_date >= p1.sale_date 
        AND p2.sale_date <= great_end_date
), final AS (
    SELECT p.zip_code,
        d.date as product_ol
    FROM rows_of_interesting_sales AS p
    JOIN date_table d 
        ON p.start_date >= d.date AND p.end_date <= d.date
)
SELECT md.state
    ,md.city
    ,COUNT(DISTINCT f.product_ol) AS ol_count
FROM main_db AS md
LEFT JOIN final AS f 
    ON md.zip_code = final.zip_code
GROUP by md.state, md.city;

But there is something uncomfortable to me about your product_1 and product_2 tables, and I suspect to are wanting the combination on all products... but your use case seems strange..

Upvotes: 4

Related Questions