Buchiman
Buchiman

Reputation: 320

Cumulative count to group in SQL

I have a query that returns a list of dates, now what I want to achieve is to group them and identify the grouping with a null.

Example Query:

SELECT  *,
ROW_NUMBER() OVER(PARTITION BY Date1.Date1 ORDER BY Date1.Date1) as rwnumber
FROM Date1
    LEFT JOIN Date2
        ON Date1.Date1= Date2.Date1

OUTPUT:

ID     DATE1         DATE2         rwnumber
1    06-10-2019    06-10-2019         1
1    06-25-2019    06-25-2019         1
1    07-10-2019    NULL               1
1    07-25-2019    07-25-2019         1    
1    08-10-2019    08-10-2019         1

What I want to achieve is this desired OUTPUT:

ID     DATE1         DATE2         rwnumber
1    06-10-2019    06-10-2019         1
1    06-25-2019    06-25-2019         1
1    07-10-2019    NULL               2 or 0
1    07-25-2019    07-25-2019         3    
1    08-10-2019    08-10-2019         3

So I can group this dates to two group.

Other Example Output:

ID     DATE1         DATE2         rwnumber
1    06-10-2019    06-10-2019        group 1
1    06-25-2019    NULL               
1    07-10-2019    07-10-2019        group 2      
1    07-25-2019    07-25-2019        group 2    
1    08-10-2019    08-10-2019        group 2

ID     DATE1         DATE2         rwnumber
1    06-10-2019    06-10-2019        group 1
1    06-25-2019    06-25-2019        group 1               
1    07-10-2019    07-10-2019        group 1      
1    07-25-2019    NULL                  
1    08-10-2019    08-10-2019        group 2

Upvotes: 1

Views: 125

Answers (2)

Gordon Linoff
Gordon Linoff

Reputation: 1269443

You seem to want a cumulative count of NULL values:

select t.*,
       (case when date2 is null then 0
             else 1 + sum(case when date2 is null then 1 else 0 end) over (order by date1)
        end) as grouping
from t;

This returns grouping values of 1, 2, 3 and so on for the groups. The NULL values all have a value of 0. This seems quite consistent with what you want to do.

Upvotes: 2

Vignesh Kumar A
Vignesh Kumar A

Reputation: 28403

RANK and DENSE_RANK are deterministic in this case, all rows with the same value for both the ordering and partitioning columns will end up with an equal result, whereas ROW_NUMBER will arbitrarily (non deterministically) assign an incrementing result to the tied rows.

you shouldn't be using ROW_NUMBER(),

  • use DENSE_RANK() instead
  • remove PARTITION BY

query,

    WITH T(ID, Date1)
         AS (SELECT 1,'06-10-2019' UNION ALL
             SELECT 1,'06-25-2019' UNION ALL
             SELECT 1,'07-10-2019' UNION ALL
             SELECT 1,'07-25-2019' UNION ALL
             SELECT 1,'08-10-2019')
    SELECT *,            
           DENSE_RANK() OVER(ORDER BY Month(Date1)) AS 'DENSE_RANK'
    FROM   T  

Upvotes: 2

Related Questions