Mel
Mel

Reputation: 43

Give unique identifier to consecutive groupings

I'm trying to identify groups based on sequential numbers. For example, I have a dataframe that looks like this (simplified):

UID
1
2
3
4
5
6
7
11
12
13
15
17
20
21
22

And I would like to add a column that identifies when there are groupings of consecutive numbers, for example, 1 to 7 are first consecutive , then they get 1 , the second consecutive set will get 2 etc .

UID Group
1   1
2   1
3   1
4   1
5   1
6   1
7   1
11  2
12  2
13  2
15  3
17  4
20  5
21  5
22  5

none of the existed code helped me to solved this issue

Upvotes: 1

Views: 116

Answers (2)

lmo
lmo

Reputation: 38500

Here is one base R method that uses diff, a logical check, and cumsum:

cumsum(c(1, diff(df$UID) > 1))
 [1] 1 1 1 1 1 1 1 2 2 2 3 4 5 5 5

Adding this onto the data.frame, we get:

df$id <- cumsum(c(1, diff(df$UID) > 1))
df
   UID id
1    1  1
2    2  1
3    3  1
4    4  1
5    5  1
6    6  1
7    7  1
8   11  2
9   12  2
10  13  2
11  15  3
12  17  4
13  20  5
14  21  5
15  22  5

Or you can also use dplyr as follows :

library(dplyr)
df %>% mutate(ID=cumsum(c(1, diff(df$UID) > 1)))
#   UID ID
#1    1  1
#2    2  1
#3    3  1
#4    4  1
#5    5  1
#6    6  1
#7    7  1
#8   11  2
#9   12  2
#10  13  2
#11  15  3
#12  17  4
#13  20  5
#14  21  5
#15  22  5

Upvotes: 1

akrun
akrun

Reputation: 886948

We can also get the difference between the current row and the previous row using the shift function from data.table, get the cumulative sum of the logical vector and assign it to create the 'Group' column. This will be faster.

library(data.table)
setDT(df1)[, Group := cumsum(UID- shift(UID, fill = UID[1])>1)+1]
df1
#    UID Group
# 1:   1     1
# 2:   2     1
# 3:   3     1
# 4:   4     1
# 5:   5     1
# 6:   6     1
# 7:   7     1
# 8:  11     2
# 9:  12     2
#10:  13     2
#11:  15     3
#12:  17     4
#13:  20     5
#14:  21     5
#15:  22     5

Upvotes: 0

Related Questions