Jacob Curtis
Jacob Curtis

Reputation: 808

How do I create a variable that increments by 1 based on the value of another variable?

Say I have a dataset like this:

id <- rep(1, 9)
start_over <- c(rep(NA, 3), "yes", NA, "yes", rep(NA, 3))
dat <- data.frame(id, start_over)

I.e.,

    id  start_over
1   1   NA
2   1   NA 
3   1   NA
4   1   yes
5   1   NA
6   1   yes
7   1   NA
8   1   NA
9   1   NA

How would I create a new variable that increments by one each each time start_over is "yes".

i.e.,

    id  start_over   assignment
1   1   NA           1
2   1   NA           1
3   1   NA           1
4   1   yes          2
5   1   NA           2
6   1   yes          3
7   1   NA           3
8   1   NA           3
9   1   NA           3

Upvotes: 1

Views: 123

Answers (3)

Darren Tsai
Darren Tsai

Reputation: 35584

NA can be identified just with is.na() function, and then cumsum() the boolean values.

library(dplyr)
dat %>% mutate(x = cumsum(!is.na(start_over)) + 1)

#   id start_over x
# 1  1       <NA> 1
# 2  1       <NA> 1
# 3  1       <NA> 1
# 4  1        yes 2
# 5  1       <NA> 2
# 6  1        yes 3
# 7  1       <NA> 3
# 8  1       <NA> 3
# 9  1       <NA> 3

Upvotes: 1

Jaap
Jaap

Reputation: 83235

A small improvement of my comment:

dat$assignment <- cumsum(dat$start_over %in% "yes") + 1

which gives:

> dat
  id start_over assignment
1  1       <NA>          1
2  1       <NA>          1
3  1       <NA>          1
4  1        yes          2
5  1       <NA>          2
6  1        yes          3
7  1       <NA>          3
8  1       <NA>          3
9  1       <NA>          3

Upvotes: 5

bouncyball
bouncyball

Reputation: 10761

We can use the cumsum function:

cumsum(dat$start_over == "yes" & !is.na(dat$start_over)) + 1
# [1] 1 1 1 2 2 3 3 3 3

We're checking to see if start_over equals "yes" and is not NA. If these conditions hold, we'll return a 1, otherwise it will return 0. We need to add 1 to the cumsum otherwise the assignment will start at 0.

dat$assignment <- cumsum(dat$start_over == "yes" & !is.na(dat$start_over)) + 1

#   id start_over assignment
# 1  1       <NA>          1
# 2  1       <NA>          1
# 3  1       <NA>          1
# 4  1        yes          2
# 5  1       <NA>          2
# 6  1        yes          3
# 7  1       <NA>          3
# 8  1       <NA>          3
# 9  1       <NA>          3

Upvotes: 2

Related Questions