Creating a new variable while using subsequent values in r

Question

I have the following data frame:

df1 <- data.frame(id = rep(1:3, each = 5), 
                  time = rep(1:5),
                  y = c(rep(1, 4), 0, 1, 0, 1, 1, 0, 0, 1, rep(0,3)))

df1
##    id time y
## 1   1    1 1
## 2   1    2 1
## 3   1    3 1
## 4   1    4 1
## 5   1    5 0
## 6   2    1 1
## 7   2    2 0
## 8   2    3 1
## 9   2    4 1
## 10  2    5 0
## 11  3    1 0
## 12  3    2 1
## 13  3    3 0
## 14  3    4 0
## 15  3    5 0

I'd like to create a new indicator variable that tells me, for each of the three ids, at what point y = 0 for all subsequent responses. In the example above, for ids 1 and 2 this occurs at the 5th time point, and for id 3 this occurs at the 3rd time point.

I'm getting tripped up on id 2, where y = 1 at time point 2, but then goes back to one -- I'd like to the indicator variable to take subsequent time points into account.

Essentially, I'm looking for the following output:

df1
##    id time y new_col
## 1   1    1 1       0
## 2   1    2 1       0
## 3   1    3 1       0
## 4   1    4 1       0
## 5   1    5 0       1
## 6   2    1 1       0
## 7   2    2 0       0
## 8   2    3 1       0
## 9   2    4 1       0
## 10  2    5 0       1
## 11  3    1 0       0
## 12  3    2 1       0
## 13  3    3 0       1
## 14  3    4 0       1
## 15  3    5 0       1

The new_col variable is indicating whether or not y = 0 at that time point and for all subsequent time points.

talat · Accepted Answer

I would use a little helper function for that.

foo <- function(x, val) {
  pos <- max(which(x != val)) +1
  as.integer(seq_along(x) >= pos)
}

df1 %>% 
  group_by(id) %>% 
  mutate(indicator = foo(y, 0))

# # A tibble: 15 x 4
# # Groups:   id [3]
#     id  time     y indicator
#          
# 1     1     1     1         0
# 2     1     2     1         0
# 3     1     3     1         0
# 4     1     4     1         0
# 5     1     5     0         1
# 6     2     1     1         0
# 7     2     2     0         0
# 8     2     3     1         0
# 9     2     4     1         0
# 10     2     5     0         1
# 11     3     1     0         0
# 12     3     2     1         0
# 13     3     3     0         1
# 14     3     4     0         1
# 15     3     5     0         1

In case you want to consider NA-values in y, you can adjust foo to:

foo <- function(x, val) {
  pos <- max(which(x != val | is.na(x))) +1
  as.integer(seq_along(x) >= pos)
}

That way, if there's a NA after the last y=0, the indicator will remain 0.

Creating a new variable while using subsequent values in r

Answers (2)

Related Questions