Reputation: 598
I would like to add an ID column for group but increment the ID by 1 when the value is repeated after having some other value in the interval. For example,consider the following column having values
Unique Value
jack A
jack A
jack A
bill B
bill B
rick C
jack D
jack D
bill A
rick C
jack A
jack A
jack A
Intended Result:
Unique Value ID
jack A 1
jack A 1
jack A 1
bill B 1
bill B 1
rick C 1
jack D 1
jack D 1
bill A 1
rick A 1
jack A 2
jack A 2
jack A 2
I tried the following code from the data.table
package:
DT[, ID := sequence(.N), by = Value]
This helps only in adding IDs to groups and ID is reset to 1
for the next occurrence. Here I would like to increment it by 1
Thank you for the help in advance. Any additional links to understand the above example are highly appreciated.
The final intention is to get a concatenated column which will read something like this jack_a_1
and jack_a_2
so that this column will act as an identifier for further steps.
Upvotes: 0
Views: 637
Reputation: 389355
Here is one way using dplyr
. We paste
Unique
and Value
column together (col
), create an incremental counter for every change in col
, group_by
Unique
and Value
together and create a ranking column with dense_rank
.
library(dplyr)
df %>%
mutate(col = paste(Unique, Value, sep ="_"),
group = cumsum(col != lag(col, default = first(col)))) %>%
group_by(Unique, Value) %>%
mutate(ID = dense_rank(group)) %>%
select(-group)
# Unique Value col ID
# <fct> <fct> <chr> <int>
# 1 jack A jack_A 1
# 2 jack A jack_A 1
# 3 jack A jack_A 1
# 4 bill B bill_B 1
# 5 bill B bill_B 1
# 6 rick C rick_C 1
# 7 jack D jack_D 1
# 8 jack D jack_D 1
# 9 bill A bill_A 1
#10 rick A rick_A 1
#11 jack A jack_A 2
#12 jack A jack_A 2
#13 jack A jack_A 2
If not needed you can remove the col
column in final output.
data
df <- structure(list(Unique = structure(c(2L, 2L, 2L, 1L, 1L, 3L, 2L,
2L, 1L, 3L, 2L, 2L, 2L), .Label = c("bill", "jack", "rick"), class = "factor"),
Value = structure(c(1L, 1L, 1L, 2L, 2L, 3L, 4L, 4L, 1L, 1L,
1L, 1L, 1L), .Label = c("A", "B", "C", "D"), class = "factor")), row.names = c(NA,
-13L), class = "data.frame")
Upvotes: 0
Reputation: 6979
You will can try using shift
and cumsum
in data.table
to achieve this:
library(data.table)
dt <- read.table(text = "Value
A
A
A
B
B
C
A
A", header = TRUE)
dt <- data.table(dt)
dt[, change := shift(Value) != Value ]
dt[is.na(change), change := TRUE]
dt[, res := cumsum(change), by = Value]
# Value change res
# 1: A TRUE 1
# 2: A FALSE 1
# 3: A FALSE 1
# 4: B TRUE 1
# 5: B FALSE 1
# 6: C TRUE 1
# 7: A TRUE 2
# 8: A FALSE 2
UPDATE
To achieve this within extra column, based on updated example:
dt[, change := shift(Value) != Value, by = Unique]
dt[is.na(change), change := TRUE]
dt[, res := cumsum(change), by = .(Value, Unique)]
Upvotes: 3