Reputation: 440
I have a table like so
name|subtitle|date
ABC|excel|2018-07-07
ABC|excel|2018-08-08
ABC|ppt|2018-09-09
ABC|ppt|2018-10-10
ABC|excel|2018-11-11
ABC|ppt|2018-12-12
DEF|ppt|2018-12-31
I want to add a column that increments whenever there's a change in the subtitle, like so:
name|subtitle|date|Group_Number
ABC|excel|2018-07-07|1
ABC|excel|2018-08-08|1
ABC|ppt|2018-09-09|2
ABC|ppt|2018-10-10|2
ABC|excel|2018-11-11|3
ABC|ppt|2018-12-12|4
DEF|ppt|2018-12-31|1
the problem is if I do Dense_rank() over(partition by name order by subtitle) then not only will this group all subtitles into one group but it also remove the date ordering. I've also tried using the lag function but that doesn't seem to be very useful when you're trying to increment a column.
Is there a simple way to achieve this?
Bear in mind that the table I'm using has hundreds of different names.
Upvotes: 1
Views: 962
Reputation: 131571
Quick answer
declare @table table (name varchar(20),subtitle varchar(20),[date] date )
insert into @table (name,subtitle,date)
values
('ABC','excel','2018-07-07'),
('ABC','excel','2018-08-08'),
('ABC','ppt','2018-09-09'),
('ABC','ppt','2018-10-10'),
('ABC','excel','2018-11-11'),
('ABC','ppt','2018-12-12'),
('DEF','ppt','2018-12-31');
with nums as (
select *,
case when subtitle != lag(subtitle,1) over (partition by name order by date)
then 1
else 0 end as num
from @table
)
select *,
1+sum(num) over (partition by name order by date) AS Group_Number
from nums
Explanation
What you ask isn't exactly ranking. You are trying to detect "islands" where the name and subtitle are the same in a sequences ordered strictly by the date.
To do that, you can compare the current row's value to the previous one. If they match, you are in the same "island". If not, there's a switch. You can use that to emit eg 1
each time a change is detected.
That's what:
CASE WHEN subtitle != LAG(subtitle,1) OVER (PARTITION BY name ORDER BY date)
THEN 1
Once you have that, you can calculate the number of changes with a running total :
sum(num) over (partition by name order by date) AS Group_Number
This will generate values starting from 0. To get numbers starting from 1, just add 1:
1+sum(num) over (partition by name order by date) AS Group_Number
UPDATE
As T. Clausen explains in the comments, reversing the comparison will get rid of the +1
:
with nums as (
select *,
case when subtitle = lag(subtitle,1) over (partition by name order by date)
then 0
else 1 end as num
from @table
)
select *,
sum(num) over (partition by name order by date) AS Group_Number
from nums
It's also a better way to detect islands, even if the results in this case are the same. The first query would produce this result :
name subtitle date num Group_Number
ABC excel 2018-07-07 0 1
ABC excel 2018-08-08 0 1
ABC ppt 2018-09-09 1 2
ABC ppt 2018-10-10 0 2
ABC excel 2018-11-11 1 3
ABC ppt 2018-12-12 1 4
DEF ppt 2018-12-31 0 1
The query emits 1
when a subtitle break is detected except at the boundaries.
The second query returns :
name subtitle date num Group_Number
ABC excel 2018-07-07 1 1
ABC excel 2018-08-08 0 1
ABC ppt 2018-09-09 1 2
ABC ppt 2018-10-10 0 2
ABC excel 2018-11-11 1 3
ABC ppt 2018-12-12 1 4
DEF ppt 2018-12-31 1 1
In this case 1
is emitted for each change, including the boundaries
Upvotes: 2