EML
EML

Reputation: 671

Fill non missing values if value occurs in grouped data using dplyr

For a given ID, I would like to convert all values to "yes" if a "yes" is present in any year and convert values to "no" if only "no" is present in all years. Here is an example:

data <- data.frame(
  id=c(1,1,2,2,3,3,4,4,5,5),
  year=rep(c(2010, 2011), 5),
  employ=c("yes", "yes", "no", "yes", "yes", "no", NA, "yes", "no", NA))

> data
   id year employ
1   1 2010    yes
2   1 2011    yes
3   2 2010     no
4   2 2011    yes
5   3 2010    yes
6   3 2011     no
7   4 2010   <NA>
8   4 2011    yes
9   5 2010     no
10  5 2011   <NA>

Desired output:

data2 <- data.frame(
  id=c(1,1,2,2,3,3,4,4,5,5),
  year=c(2010, 2011, 2010, 2011, 2010, 2011, 2010, 2011, 2010, 2011),
  employ=c("yes", "yes", "yes", "yes", "yes", "yes","yes", "yes","no", "no"))

> data2
   id year employ
1   1 2010    yes
2   1 2011    yes
3   2 2010    yes
4   2 2011    yes
5   3 2010    yes
6   3 2011    yes
7   4 2010    yes
8   4 2011    yes
9   5 2010     no
10  5 2011     no

Upvotes: 1

Views: 54

Answers (2)

Charlie Gallagher
Charlie Gallagher

Reputation: 616

You can group and use any, after converting NA to "no".

data %>% 
  group_by(id) %>% 
  mutate(
    employ = replace_na(employ, "no"),
    employ = case_when(any(employ == "yes") ~ "yes",
                       TRUE ~ "no"),
    ) %>% ungroup()

#      id  year employ
# 1     1  2010 yes   
# 2     1  2011 yes   
# 3     2  2010 yes   
# 4     2  2011 yes   
# 5     3  2010 yes   
# 6     3  2011 yes   
# 7     4  2010 yes   
# 8     4  2011 yes   
# 9     5  2010 no    
# 10    5  2011 no

Upvotes: 1

akrun
akrun

Reputation: 887851

An option is to convert to factor with levels specified and select the first level after dropping the levels

library(dplyr)
data %>%
   group_by(id) %>%
   mutate(employ = levels(droplevels(factor(employ, 
         levels = c('yes', 'no'))))[1]) %>%
   ungroup

-output

# A tibble: 10 x 3
#      id  year employ
#   <dbl> <dbl> <chr> 
# 1     1  2010 yes   
# 2     1  2011 yes   
# 3     2  2010 yes   
# 4     2  2011 yes   
# 5     3  2010 yes   
# 6     3  2011 yes   
# 7     4  2010 yes   
# 8     4  2011 yes   
# 9     5  2010 no    
#10     5  2011 no    

If there are all NA for a particular 'id', it returns NA


Or use a condition with if/else

data %>%
   group_by(id) %>% 
   mutate(employ = if('yes' %in% employ) 'yes' else 'no') %>%
   ungroup

Upvotes: 1

Related Questions