Reputation: 808
Say I have a dataset like this:
id <- rep(1, 9)
start_over <- c(rep(NA, 3), "yes", NA, "yes", rep(NA, 3))
dat <- data.frame(id, start_over)
I.e.,
id start_over
1 1 NA
2 1 NA
3 1 NA
4 1 yes
5 1 NA
6 1 yes
7 1 NA
8 1 NA
9 1 NA
How would I create a new variable that increments by one each each time start_over is "yes".
i.e.,
id start_over assignment
1 1 NA 1
2 1 NA 1
3 1 NA 1
4 1 yes 2
5 1 NA 2
6 1 yes 3
7 1 NA 3
8 1 NA 3
9 1 NA 3
Upvotes: 1
Views: 123
Reputation: 35584
NA
can be identified just with is.na()
function, and then cumsum()
the boolean values.
library(dplyr)
dat %>% mutate(x = cumsum(!is.na(start_over)) + 1)
# id start_over x
# 1 1 <NA> 1
# 2 1 <NA> 1
# 3 1 <NA> 1
# 4 1 yes 2
# 5 1 <NA> 2
# 6 1 yes 3
# 7 1 <NA> 3
# 8 1 <NA> 3
# 9 1 <NA> 3
Upvotes: 1
Reputation: 83235
A small improvement of my comment:
dat$assignment <- cumsum(dat$start_over %in% "yes") + 1
which gives:
> dat id start_over assignment 1 1 <NA> 1 2 1 <NA> 1 3 1 <NA> 1 4 1 yes 2 5 1 <NA> 2 6 1 yes 3 7 1 <NA> 3 8 1 <NA> 3 9 1 <NA> 3
Upvotes: 5
Reputation: 10761
We can use the cumsum
function:
cumsum(dat$start_over == "yes" & !is.na(dat$start_over)) + 1
# [1] 1 1 1 2 2 3 3 3 3
We're checking to see if start_over
equals "yes" and is not NA
. If these conditions hold, we'll return a 1, otherwise it will return 0. We need to add 1
to the cumsum
otherwise the assignment will start at 0.
dat$assignment <- cumsum(dat$start_over == "yes" & !is.na(dat$start_over)) + 1
# id start_over assignment
# 1 1 <NA> 1
# 2 1 <NA> 1
# 3 1 <NA> 1
# 4 1 yes 2
# 5 1 <NA> 2
# 6 1 yes 3
# 7 1 <NA> 3
# 8 1 <NA> 3
# 9 1 <NA> 3
Upvotes: 2