Anonymous
Anonymous

Reputation: 440

Dense rank, partitioned by column A, incremented by change in column B but ordered by column C

I have a table like so

name|subtitle|date
ABC|excel|2018-07-07
ABC|excel|2018-08-08
ABC|ppt|2018-09-09
ABC|ppt|2018-10-10
ABC|excel|2018-11-11
ABC|ppt|2018-12-12
DEF|ppt|2018-12-31

I want to add a column that increments whenever there's a change in the subtitle, like so:

name|subtitle|date|Group_Number
ABC|excel|2018-07-07|1
ABC|excel|2018-08-08|1
ABC|ppt|2018-09-09|2
ABC|ppt|2018-10-10|2
ABC|excel|2018-11-11|3
ABC|ppt|2018-12-12|4
DEF|ppt|2018-12-31|1

the problem is if I do Dense_rank() over(partition by name order by subtitle) then not only will this group all subtitles into one group but it also remove the date ordering. I've also tried using the lag function but that doesn't seem to be very useful when you're trying to increment a column.

Is there a simple way to achieve this?

Bear in mind that the table I'm using has hundreds of different names.

Upvotes: 1

Views: 962

Answers (1)

Panagiotis Kanavos
Panagiotis Kanavos

Reputation: 131571

Quick answer

declare @table table (name varchar(20),subtitle varchar(20),[date] date )

insert into @table (name,subtitle,date)
values
('ABC','excel','2018-07-07'),
('ABC','excel','2018-08-08'),
('ABC','ppt','2018-09-09'),
('ABC','ppt','2018-10-10'),
('ABC','excel','2018-11-11'),
('ABC','ppt','2018-12-12'),
('DEF','ppt','2018-12-31');

with nums as (

    select *,  
         case when subtitle != lag(subtitle,1) over (partition by name order by date) 
              then 1 
              else 0 end as num
    from @table
)
select *,
    1+sum(num) over (partition by name order by date) AS Group_Number
from nums

Explanation

What you ask isn't exactly ranking. You are trying to detect "islands" where the name and subtitle are the same in a sequences ordered strictly by the date.

To do that, you can compare the current row's value to the previous one. If they match, you are in the same "island". If not, there's a switch. You can use that to emit eg 1 each time a change is detected.

That's what:

CASE WHEN subtitle != LAG(subtitle,1) OVER (PARTITION BY name ORDER BY date) 
     THEN 1 

Once you have that, you can calculate the number of changes with a running total :

sum(num) over (partition by name order by date) AS Group_Number

This will generate values starting from 0. To get numbers starting from 1, just add 1:

1+sum(num) over (partition by name order by date) AS Group_Number

UPDATE

As T. Clausen explains in the comments, reversing the comparison will get rid of the +1 :

with nums as (

    select *,  
         case when subtitle = lag(subtitle,1) over (partition by name order by date) 
              then 0 
              else 1 end as num
    from @table
)
select *,
    sum(num) over (partition by name order by date) AS Group_Number
from nums

It's also a better way to detect islands, even if the results in this case are the same. The first query would produce this result :

name    subtitle    date    num Group_Number
ABC     excel   2018-07-07  0   1
ABC     excel   2018-08-08  0   1
ABC     ppt     2018-09-09  1   2
ABC     ppt     2018-10-10  0   2
ABC     excel   2018-11-11  1   3
ABC     ppt     2018-12-12  1   4
DEF     ppt     2018-12-31  0   1

The query emits 1 when a subtitle break is detected except at the boundaries.

The second query returns :

name    subtitle    date    num Group_Number
ABC     excel   2018-07-07  1   1
ABC     excel   2018-08-08  0   1
ABC     ppt     2018-09-09  1   2
ABC     ppt     2018-10-10  0   2
ABC     excel   2018-11-11  1   3
ABC     ppt     2018-12-12  1   4
DEF     ppt     2018-12-31  1   1

In this case 1 is emitted for each change, including the boundaries

Upvotes: 2

Related Questions