Finding the first non-zero year in data frame for multiple variables using tidyverse

Question

I have the following data:

library(tidyverse)
set.seed(1)
test <- data.frame(id = c(rep(1, 3), rep(2, 4), rep(3, 5)),
                   Year = 2000 + c(1,3,5,2,3,5,6,1,2,3,4,5),
                   var1 = sample(0:2, replace = TRUE, size = 12, prob = c(0.6, 0.3, 0.1)),
                   var2 = sample(0:2, replace = TRUE, size = 12, prob = c(0.6, 0.3, 0.1)))

I need to the first year that each variable (var1 and var2) is non-zero within each id group.

I know how to find the row number of the first non-zero row:

temp <- function(a) ifelse(length(head(which(a>0),1))==0,0,head(which(a>0),1))

test2 <- test %>% group_by(id) %>% 
mutate_at(vars(var1:var2),funs(temp)) %>%
filter(row_number()==1) %>% select (-year)

     id  var1  var2
1     1   0     1
2     2   1     2
3     3   1     1

However, I am not sure how to match the row number back to the year variable so that I will know exactly when did the var1 and var2 turn non-zero, instead of only having the row numbers.

This is what I want:

     id  var1  var2
1     1   0     2001
2     2   2002  2003
3     3   2001  2001

Julius Vainora · Accepted Answer

We may do the following:

test %>% group_by(id) %>% summarise_at(vars(var1:var2), funs(Year[. > 0][1]))
# A tibble: 3 x 3
#      id  var1  var2
#     
# 1     1    NA  2001
# 2     2  2002  2003
# 3     3  2001  2001

That is, . > 0 gives a logical vector with TRUE whenever a value is positive, then we select all the corresponding years, and lastly pick only the first one.

That's very similar to your approach. Notice that due to using summarise I no longer need filter(row_number()==1) %>% select (-year). Also, my function corresponding to temp is more concise.

Finding the first non-zero year in data frame for multiple variables using tidyverse

Answers (2)

Related Questions