marine8115
marine8115

Reputation: 598

How to label a sequence of same values

I would like to add an ID column for group but increment the ID by 1 when the value is repeated after having some other value in the interval. For example,consider the following column having values

Unique    Value
 jack       A
 jack       A
 jack       A
 bill       B
 bill       B
 rick       C
 jack       D
 jack       D
 bill       A
 rick       C
 jack       A
 jack       A
 jack       A

Intended Result:

    Unique    Value      ID
     jack       A        1
     jack       A        1
     jack       A        1
     bill       B        1
     bill       B        1
     rick       C        1
     jack       D        1
     jack       D        1
     bill       A        1
     rick       A        1
     jack       A        2
     jack       A        2
     jack       A        2 

I tried the following code from the data.table package:

DT[, ID := sequence(.N), by = Value]

This helps only in adding IDs to groups and ID is reset to 1 for the next occurrence. Here I would like to increment it by 1

Thank you for the help in advance. Any additional links to understand the above example are highly appreciated.

The final intention is to get a concatenated column which will read something like this jack_a_1 and jack_a_2 so that this column will act as an identifier for further steps.

Upvotes: 0

Views: 637

Answers (2)

Ronak Shah
Ronak Shah

Reputation: 389355

Here is one way using dplyr. We paste Unique and Value column together (col), create an incremental counter for every change in col, group_by Unique and Value together and create a ranking column with dense_rank.

library(dplyr)

df %>%
  mutate(col = paste(Unique, Value, sep ="_"), 
         group = cumsum(col != lag(col, default = first(col)))) %>%
  group_by(Unique, Value) %>%
  mutate(ID = dense_rank(group)) %>%
  select(-group)

#   Unique Value  col      ID
#   <fct> <fct>  <chr>  <int>
# 1 jack   A     jack_A     1
# 2 jack   A     jack_A     1
# 3 jack   A     jack_A     1
# 4 bill   B     bill_B     1
# 5 bill   B     bill_B     1
# 6 rick   C     rick_C     1
# 7 jack   D     jack_D     1
# 8 jack   D     jack_D     1
# 9 bill   A     bill_A     1
#10 rick   A     rick_A     1
#11 jack   A     jack_A     2
#12 jack   A     jack_A     2
#13 jack   A     jack_A     2

If not needed you can remove the col column in final output.

data

df <- structure(list(Unique = structure(c(2L, 2L, 2L, 1L, 1L, 3L, 2L, 
2L, 1L, 3L, 2L, 2L, 2L), .Label = c("bill", "jack", "rick"), class = "factor"), 
Value = structure(c(1L, 1L, 1L, 2L, 2L, 3L, 4L, 4L, 1L, 1L, 
1L, 1L, 1L), .Label = c("A", "B", "C", "D"), class = "factor")), row.names = c(NA, 
-13L), class = "data.frame")

Upvotes: 0

Bulat
Bulat

Reputation: 6979

You will can try using shift and cumsum in data.table to achieve this:

library(data.table)
dt <- read.table(text = "Value
A
A
A
B
B
C
A
A", header = TRUE)
dt <- data.table(dt)

dt[, change := shift(Value) != Value ]
dt[is.na(change), change := TRUE]
dt[, res := cumsum(change), by = Value]
#       Value  change res
# 1:     A     TRUE   1
# 2:     A     FALSE  1
# 3:     A     FALSE  1
# 4:     B     TRUE   1
# 5:     B     FALSE  1
# 6:     C     TRUE   1
# 7:     A     TRUE   2
# 8:     A     FALSE  2

UPDATE

To achieve this within extra column, based on updated example:

dt[, change := shift(Value) != Value, by = Unique]
dt[is.na(change), change := TRUE]
dt[, res := cumsum(change), by = .(Value, Unique)]

Upvotes: 3

Related Questions