Reputation: 1441
I have a dataframe that looks like this:
ID value condition
A 0 0
A 3 0
A 0 1
A 7 1
A 5 0
A 5 0
A 5 0
A 7 0
B 6 0
B 2 1
B 7 0
B 10 1
B 0 0
B 6 0
I want to change the ID name when the condition is met and change the name of the ID that follows too. The condition can be met multiple times per ID so I'd like to modify it each time.
The result would change the original ID or just add a new column:
ID value condition newID
A 0 0 A
A 3 0 A
A 0 1 A1
A 7 1 A1
A 5 0 A2
A 5 0 A2
A 5 0 A2
A 7 0 A2
B 6 0 B
B 2 1 B1
B 7 0 B2
B 10 1 B3
B 0 0 B4
B 6 0 B4
Upvotes: 2
Views: 343
Reputation: 42564
If I understand correctly, the OP wants to create subgroups within each ID
for each contiguous streak of condition
.
Unfortunately, the OP has requested to name the subgroups in a special way which makes the solutions overly complicated. By OP's request, the subgroups are to be named, e.g., A, A1, A2
which means that subgroup numbering and subgroup name is off by one, e.g., the second subgroup is named A1
, the third one A2
, etc.
In case a more simplified naming scheme would be acceptable, we can directly benefit from the prefix
parameter of the rleid()
function. Then, the first subgroup of group A
will be named A1
, the second A2
, etc.
library(dplyr)
df %>%
group_by(ID) %>%
mutate(newID = data.table::rleid(condition, prefix = first(ID)))
# A tibble: 14 x 4 # Groups: ID [2] ID value condition newID <chr> <int> <int> <chr> 1 A 0 0 A1 2 A 3 0 A1 3 A 0 1 A2 4 A 7 1 A2 5 A 5 0 A3 6 A 5 0 A3 7 A 5 0 A3 8 A 7 0 A3 9 B 6 0 B1 10 B 2 1 B2 11 B 7 0 B3 12 B 10 1 B4 13 B 0 0 B5 14 B 6 0 B5
library(data.table)
setDT(df)[, newID := rleid(condition, prefix = ID), ID][]
ID value condition newID 1: A 0 0 A1 2: A 3 0 A1 3: A 0 1 A2 4: A 7 1 A2 5: A 5 0 A3 6: A 5 0 A3 7: A 5 0 A3 8: A 7 0 A3 9: B 6 0 B1 10: B 2 1 B2 11: B 7 0 B3 12: B 10 1 B4 13: B 0 0 B5 14: B 6 0 B5
library(data.table)
df <- fread("ID value condition
A 0 0
A 3 0
A 0 1
A 7 1
A 5 0
A 5 0
A 5 0
A 7 0
B 6 0
B 2 1
B 7 0
B 10 1
B 0 0
B 6 0")
Upvotes: 0
Reputation: 14764
Could also do:
library(dplyr)
df %>%
group_by(ID) %>%
mutate(newID = cumsum(c(0, (condition != lag(condition))[-1])),
newID = ifelse(newID != 0, paste0(ID, newID), ID))
Output:
# A tibble: 14 x 4
# Groups: ID [2]
ID value condition newID
<chr> <int> <int> <chr>
1 A 0 0 A
2 A 3 0 A
3 A 0 1 A1
4 A 7 1 A1
5 A 5 0 A2
6 A 5 0 A2
7 A 5 0 A2
8 A 7 0 A2
9 B 6 0 B
10 B 2 1 B1
11 B 7 0 B2
12 B 10 1 B3
13 B 0 0 B4
14 B 6 0 B4
Upvotes: 3
Reputation: 28705
Same idea as @akrun but using only data.table
library(data.table)
setDT(df)
df[, newID := paste0(ID, gsub('^0$', '', rleid(condition) - 1)), ID]
df
# ID value condition newID
# 1: A 0 0 A
# 2: A 3 0 A
# 3: A 0 1 A1
# 4: A 7 1 A1
# 5: A 5 0 A2
# 6: A 5 0 A2
# 7: A 5 0 A2
# 8: A 7 0 A2
# 9: B 6 0 B
# 10: B 2 1 B1
# 11: B 7 0 B2
# 12: B 10 1 B3
# 13: B 0 0 B4
# 14: B 6 0 B4
Upvotes: 3
Reputation: 887671
One option after grouping by 'ID', create the index with rleid
(from data.table
) and change it to paste
the 'ID' based on the condition with case_when
library(dplyr)
library(data.table)
df1 %>%
group_by(ID) %>%
mutate(newID = rleid(condition)-1,
newID = case_when(newID == 0 ~ first(ID), TRUE ~ paste0(first(ID), newID)))
# A tibble: 14 x 4
# Groups: ID [2]
# ID value condition newID
# <chr> <int> <int> <chr>
# 1 A 0 0 A
# 2 A 3 0 A
# 3 A 0 1 A1
# 4 A 7 1 A1
# 5 A 5 0 A2
# 6 A 5 0 A2
# 7 A 5 0 A2
# 8 A 7 0 A2
# 9 B 6 0 B
#10 B 2 1 B1
#11 B 7 0 B2
#12 B 10 1 B3
#13 B 0 0 B4
#14 B 6 0 B4
df1 <- structure(list(ID = c("A", "A", "A", "A", "A", "A", "A", "A",
"B", "B", "B", "B", "B", "B"), value = c(0L, 3L, 0L, 7L, 5L,
5L, 5L, 7L, 6L, 2L, 7L, 10L, 0L, 6L), condition = c(0L, 0L, 1L,
1L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 0L)), class = "data.frame",
row.names = c(NA, -14L))
Upvotes: 6