GHM11O
GHM11O

Reputation: 11

SQL query to find continuous local max, min of date based on category column

I have the following data set

    Customer_ID Category    FROM_DATE    TO_DATE
      1            5        1/1/2000    12/31/2001
      1            6        1/1/2002    12/31/2003
      1            5        1/1/2004    12/31/2005 
      2            7        1/1/2010    12/31/2011
      2            7        1/1/2012    12/31/2013
      2            5        1/1/2014    12/31/2015
      3            7        1/1/2010    12/31/2011
      3            7        1/5/2012    12/31/2013
      3            5        1/1/2014    12/31/2015

The result I want to achieve is to find continuous local min/max date for Customers with the same category and identify any gap in dates:

    Customer_ID      FROM_Date      TO_Date       Category  
      1              1/1/2000      12/31/2001         5  
      1              1/1/2002      12/31/2003         6
      1              1/1/2004      12/31/2005         5
      2              1/1/2010      12/31/2013         7  
      2              1/1/2014      12/31/2015         5
      3              1/1/2010      12/31/2011         7
      3              1/5/2012      12/31/2013         7
      3              1/1/2014      12/31/2015         5

My code works fine for customer 1 (return all 3 rows) and customer 2(return 2 rows with min and max date for each category) but for customer 3, it cannot identify the gap between 12/31/2011 and 1/5/2012 for category 7.

    Customer_ID      FROM_Date      TO_Date       Category  
      3              1/1/2010      12/31/2013         7  
      3              1/1/2014      12/31/2015         5

Here is my code:

SELECT Customer_ID, Category, min(From_Date), max(To_Date) FROM 
(
SELECT Customer_ID, Category, From_Date,To_Date
,row_number() over (order by member_id, To_Date) - row_number() over (partition by Customer_ID order by Category) as p
  FROM FFS_SAMP
  )  X
 group by Customer_ID,Category,p
 order by Customer_ID,min(From_Date),Max(To_Date)

Upvotes: 1

Views: 394

Answers (2)

GMB
GMB

Reputation: 222432

Your attempt is quite close. You just need to fix the over() clause of the window functions:

select customer_id, category, min(from_date), max(to_date) 
from (
    select 
        fs.*,
        row_number() over (partition by customer_id order from_date) 
            - row_number() over (partition by customer_id, category order by from_date) as grp
  from ffs_samp fs
) x
group by customer_id, category, grp
order by customer_id, min(from_date)

Note that this method assumes no gaps or overlalp in the periods of a given customer, as show in your sample data.

Upvotes: 1

Gordon Linoff
Gordon Linoff

Reputation: 1269643

This is a type of gaps and islands problem. Probably the safest method is to use a cumulative max() to look for overlaps with previous records. Where there is no overlap, then an "island" of records starts. So:

select customer_id, min(from_date), max(to_date), category
from (select t.*,
             sum(case when prev_to_date >= from_date then 0 else 1 end) over
                 (partition by customer_id, category
                  order by from_date
                 ) as grp
      from (select t.*,
                   max(to_date) over (partition by customer_id, category
                                      order by from_date
                                      rows between unbounded preceding and 1 preceding
                                     ) as prev_to_date
            from t
           ) t
     ) t
group by customer_id, category, grp;

Upvotes: 1

Related Questions