R refill NA values within group using dplyr

Question

I've the following data frame:

library(dplyr)

dat <- data_frame(id = c(1L, 1L, 1L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 
                     3L, 5L, 5L, 7L, 7L, 7L, 8L, 8L, 8L, 10L), 
              wish1 = c(4L, NA, NA, 1L, NA, 1L, NA, NA, NA, 
                        NA, -1L, 8L, NA, 1L, -1L, NA, 4L, 
                        NA, NA, -1L), 
              wish2 = c(1L, NA, NA, 1L, NA, 1L, NA, NA, NA, 
                        NA, -1L, 1L, NA, 2L, -1L, NA, 2L, NA, NA, 1L), 
              participate = c(NA, 1L, NA, NA, 1L, NA, NA, 1L, NA, NA, NA, 
                              NA, 1L, NA, 4L, NA, NA, NA, 1L, NA))

I want to replace within each group the NAs of variable participate with the values which are available within the same group. If there are no values within the group, then the NA can stay.

I need something like:

df <- data %>% group_by(id) %>% 
    mutate(participate = (participate, na.rm = TRUE))

Unfortunately this doesn't work without a function like sum or anything.

www · Accepted Answer

There are probably more concise or elegant ways, but I would like to share some thoughts.

Solution 1: Use the fill function from tidyr

library(tidyr)

# the fill function can fill the NA based on the previous entry
dat2 <- dat %>%
  arrange(id, participate) %>%
  group_by(id) %>%
  fill(participate)

Solution 2: Determine the fill values, then use left_join

# dat_temp is a summary data frame showing the fill values
dat_temp <- dat %>%
  arrange(id, participate) %>%
  group_by(id) %>%
  slice(1) %>%
  select(id, participate)

# Join dat_temp to dat2
dat2 <- dat %>%
  left_join(dat_temp, by = "id") %>%
  select(-participate.x) %>%
  rename(participate = participate.y)

Solution 3: Sort the data frame then fill the NA based on the first value

This solution is based on the comment from alistaire

dat2 <- dat %>% 
  arrange(id, participate) %>%
  group_by(id) %>% 
  mutate(participate = first(participate))

R refill NA values within group using dplyr

Answers (1)

Solution 1: Use the fill function from tidyr

Solution 2: Determine the fill values, then use left_join

Solution 3: Sort the data frame then fill the NA based on the first value

Related Questions