Reputation: 607
I need to assign subgroup IDs given a group ID and an indicator showing the beginning of the new subgroup. Here's a test dataset:
group <- c(rep("A", 8), rep("B", 8))
x1 <- c(rep(0, 3), rep(1, 3), rep(0, 2))
x2 <- rep(0:1, 4)
df <- data.frame(group=group, indic=c(x1, x2))
Here is the resulting data frame:
df
group indic
1 A 0
2 A 0
3 A 0
4 A 1
5 A 1
6 A 1
7 A 0
8 A 0
9 B 0
10 B 1
11 B 0
12 B 1
13 B 0
14 B 1
15 B 0
16 B 1
indic==1
means that row is the beginning of a new subgroup, and the subgroup should be numbered 1 higher than the previous subgroup. Where indic==0
the subgroup should be the same as the previous subgroup. The subgroup numbering starts at 1. When the group
variable changes, the subgroup numbering resets to 1. I would like to use the tidyverse framework.
Here is the result that I want:
df
group indic subgroup
1 A 0 1
2 A 0 1
3 A 0 1
4 A 1 2
5 A 1 3
6 A 1 4
7 A 0 4
8 A 0 4
9 B 0 1
10 B 1 2
11 B 0 2
12 B 1 3
13 B 0 3
14 B 1 4
15 B 0 4
16 B 1 5
I would like to be able to give some methods that I've tried already but didn't work, but I haven't been able to find anything even close. Any help will be appreciated.
Upvotes: 1
Views: 62
Reputation: 206187
You can just use
library(dplyr)
df %>% group_by(group) %>%
mutate(subgroup=cumsum(indic)+1)
# group indic subgroup
# <fct> <dbl> <dbl>
# 1 A 0 1
# 2 A 0 1
# 3 A 0 1
# 4 A 1 2
# 5 A 1 3
# 6 A 1 4
# 7 A 0 4
# 8 A 0 4
# 9 B 0 1
# 10 B 1 2
# 11 B 0 2
# 12 B 1 3
# 13 B 0 3
# 14 B 1 4
# 15 B 0 4
# 16 B 1 5
We use dplyr
to do the grouping and then we just use cumsum
with takes the cumulative sum of the indic
column so each time it sees a 1 it increases.
Upvotes: 1