adkane
adkane

Reputation: 1441

How do I change subsequent row values if some condition is met with multiple groups?

I have a dataframe that looks like this:

ID  value   condition
A   0         0
A   3         0
A   0         1
A   7         1
A   5         0
A   5         0
A   5         0
A   7         0
B   6         0
B   2         1
B   7         0
B   10        1
B   0         0
B   6         0

I want to change the ID name when the condition is met and change the name of the ID that follows too. The condition can be met multiple times per ID so I'd like to modify it each time.

The result would change the original ID or just add a new column:

ID  value   condition   newID
A   0              0    A
A   3              0    A
A   0              1    A1
A   7              1    A1
A   5              0    A2
A   5              0    A2
A   5              0    A2
A   7              0    A2
B   6              0    B
B   2              1    B1
B   7              0    B2
B   10             1    B3
B   0              0    B4
B   6              0    B4

Upvotes: 2

Views: 343

Answers (4)

Uwe
Uwe

Reputation: 42564

If I understand correctly, the OP wants to create subgroups within each ID for each contiguous streak of condition.

Unfortunately, the OP has requested to name the subgroups in a special way which makes the solutions overly complicated. By OP's request, the subgroups are to be named, e.g., A, A1, A2 which means that subgroup numbering and subgroup name is off by one, e.g., the second subgroup is named A1, the third one A2, etc.

In case a more simplified naming scheme would be acceptable, we can directly benefit from the prefix parameter of the rleid() function. Then, the first subgroup of group A will be named A1, the second A2, etc.

dplyr

library(dplyr)
df %>% 
  group_by(ID) %>% 
  mutate(newID = data.table::rleid(condition, prefix = first(ID)))
# A tibble: 14 x 4
# Groups:   ID [2]
   ID    value condition newID
   <chr> <int>     <int> <chr>
 1 A         0         0 A1   
 2 A         3         0 A1   
 3 A         0         1 A2   
 4 A         7         1 A2   
 5 A         5         0 A3   
 6 A         5         0 A3   
 7 A         5         0 A3   
 8 A         7         0 A3   
 9 B         6         0 B1   
10 B         2         1 B2   
11 B         7         0 B3   
12 B        10         1 B4   
13 B         0         0 B5   
14 B         6         0 B5

data.table

library(data.table)
setDT(df)[, newID := rleid(condition, prefix = ID), ID][]
    ID value condition newID
 1:  A     0         0    A1
 2:  A     3         0    A1
 3:  A     0         1    A2
 4:  A     7         1    A2
 5:  A     5         0    A3
 6:  A     5         0    A3
 7:  A     5         0    A3
 8:  A     7         0    A3
 9:  B     6         0    B1
10:  B     2         1    B2
11:  B     7         0    B3
12:  B    10         1    B4
13:  B     0         0    B5
14:  B     6         0    B5

Data

library(data.table)
df <- fread("ID  value   condition
A   0         0
A   3         0
A   0         1
A   7         1
A   5         0
A   5         0
A   5         0
A   7         0
B   6         0
B   2         1
B   7         0
B   10        1
B   0         0
B   6         0")

Upvotes: 0

arg0naut91
arg0naut91

Reputation: 14764

Could also do:

library(dplyr)

df %>%
  group_by(ID) %>%
  mutate(newID = cumsum(c(0, (condition != lag(condition))[-1])),
         newID = ifelse(newID != 0, paste0(ID, newID), ID))

Output:

# A tibble: 14 x 4
# Groups:   ID [2]
   ID    value condition newID
   <chr> <int>     <int> <chr>
 1 A         0         0 A    
 2 A         3         0 A    
 3 A         0         1 A1   
 4 A         7         1 A1   
 5 A         5         0 A2   
 6 A         5         0 A2   
 7 A         5         0 A2   
 8 A         7         0 A2   
 9 B         6         0 B    
10 B         2         1 B1   
11 B         7         0 B2   
12 B        10         1 B3   
13 B         0         0 B4   
14 B         6         0 B4  

Upvotes: 3

IceCreamToucan
IceCreamToucan

Reputation: 28705

Same idea as @akrun but using only data.table

library(data.table)
setDT(df)

df[, newID := paste0(ID, gsub('^0$', '', rleid(condition) - 1)), ID]
df
#     ID value condition newID
#  1:  A     0         0     A
#  2:  A     3         0     A
#  3:  A     0         1    A1
#  4:  A     7         1    A1
#  5:  A     5         0    A2
#  6:  A     5         0    A2
#  7:  A     5         0    A2
#  8:  A     7         0    A2
#  9:  B     6         0     B
# 10:  B     2         1    B1
# 11:  B     7         0    B2
# 12:  B    10         1    B3
# 13:  B     0         0    B4
# 14:  B     6         0    B4

Upvotes: 3

akrun
akrun

Reputation: 887671

One option after grouping by 'ID', create the index with rleid (from data.table) and change it to paste the 'ID' based on the condition with case_when

library(dplyr)
library(data.table)
df1 %>% 
   group_by(ID) %>%
   mutate(newID = rleid(condition)-1,
          newID = case_when(newID == 0 ~ first(ID), TRUE ~ paste0(first(ID), newID)))
# A tibble: 14 x 4
# Groups:   ID [2]
#   ID    value condition newID
#   <chr> <int>     <int> <chr>
# 1 A         0         0 A    
# 2 A         3         0 A    
# 3 A         0         1 A1   
# 4 A         7         1 A1   
# 5 A         5         0 A2   
# 6 A         5         0 A2   
# 7 A         5         0 A2   
# 8 A         7         0 A2   
# 9 B         6         0 B    
#10 B         2         1 B1   
#11 B         7         0 B2   
#12 B        10         1 B3   
#13 B         0         0 B4   
#14 B         6         0 B4   

data

df1 <- structure(list(ID = c("A", "A", "A", "A", "A", "A", "A", "A", 
 "B", "B", "B", "B", "B", "B"), value = c(0L, 3L, 0L, 7L, 5L, 
 5L, 5L, 7L, 6L, 2L, 7L, 10L, 0L, 6L), condition = c(0L, 0L, 1L, 
 1L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 0L)), class = "data.frame", 
 row.names = c(NA, -14L))

Upvotes: 6

Related Questions