Justin Klevs
Justin Klevs

Reputation: 651

Count consecutive duplicate values by group

I have searched the site a bit for a solution to this question but have been unable to find an answer that fits precisely what I am looking for. I am attempting to count consecutive duplicate values for each ID number, ordered by the date. My current table resembles the first 3 columns of the table below, whereas the fourth column in what I would like to add.

ID  |  date  | value  | consec_duplicates
1      1/1       1            0
1      1/2       2            0
1      1/3       2            1
1      1/4       2            2
1      1/5       3            0
1      1/6       3            1
2      1/14      1            0
2      1/15      2            0
2      1/16      3            0
2      1/17      3            1
2      1/18      4            0
2      1/19      5            0
3      1/4       1            0
3      1/5       2            0
3      1/6       2            1
3      1/7       2            2
3      1/8       2            3
3      1/9       3            0

Does anybody have any ideas of how to construct this fourth column? Thanks!

Upvotes: 0

Views: 1824

Answers (2)

Gordon Linoff
Gordon Linoff

Reputation: 1271231

This is a gap-and-islands problem. One method is the difference of row_number()s to identify the groups.

select t.*,
       dense_rank() over (partition by id order by (seqnum - seqnum_value), value) as grp,
       row_number() over (partition by id, (seqnum - seqnum_value), value order by date) as grp_seqnum
from (select t.*,
             row_number() over (partition by id order by date) as seqnum,
             row_number() over (partition by id, value order by date) as seqnum_v
      from t
     ) t;

This is a bit tricky to understand the first time you see it. If you run the subquery and stare at the results long enough, you'll get why the difference is constant for adjacent values.

EDIT:

I think Jorge is right. Your data doesn't have the same value repeated, so you can just do:

select t.*,
       row_number() over (partition by id, value order by date) as grp_seqnum
from t;

Upvotes: 1

dnoeth
dnoeth

Reputation: 60513

When the values are actually increasing all the time then this should work:

row_number() over (partition by id, value order by date) - 1

Otherwise Teradata has an extension to Standard SQL for cases like this:

row_number() 
over (partition by id
      order by date
      RESET WHEN MIN(value)  -- previous value not equal to current 
                 OVER (partition by id
                       order by date
                       rows between 1 preceding and 1 preceding) <> value
     ) - 1

Upvotes: 1

Related Questions