R: Replace missing values in one row based on unique ID for all variables in dataframe

Question

I'm working with a "long" dataframe in which unique participants have multiple observations over time. Some rows are missing data across multiple columns, and those missing records need to be filled with the same data for each participant.

My data set looks like:

list(
ID = c("A23", "A23", "A24", "A24", "A24", "A50", "A50", "A60", "A60"), 
A = c("Y", "Y", "N", "N", NA, "Y", NA, "N", "N"), 
B = c(2.4, 2.4, 2.6, 2.6, NA, 2.2, NA, 3.3, 3.3), 
C = c("M", "M", "F", "F", NA, "F", NA, "M", "M"), 
D = c(1, 1, 0, 0, NA, 1, NA, 1, 1), 
E = c("2015-02-16", "2017-10-05", "2014-03-26", "2016-09-07", "2018-10-08", "2015-02-17", "2017-10-06", "2014-03-27", "2016-09-08"))

I would like to be able to simply copy the data from the complete/non-missing rows and paste it into the missing records based on the ID variable, across the entire dataframe.

I have tried the following methods to no avail (no missing values replaced):

library(tidyverse)
library(zoo)
library(plyr)

# Attempt 1:
data %>%
  group_by(ID) %>%
  mutate_all(~replace(., is.na(.), .[!is.na(.)][1]))

# Attempt 2:
plyr::ddply(data, ~ID, na.locf)

I've found solutions for cases in which missing values of a single column need to be replaced, but I have not yet found a solution for replacing missing values across an entire dataframe.

Thanks in advance.

Robert Wilson · Accepted Answer

You can solve this using the group_by/mutate method in dplyr:

library(dplyr)
df <- tibble(
  ID = c("A23", "A23", "A24", "A24", "A24", "A50", "A50", "A60", "A60"), 
  A = c("Y", "Y", "N", "N", NA, "Y", NA, "N", "N"), 
  B = c(2.4, 2.4, 2.6, 2.6, NA, 2.2, NA, 3.3, 3.3), 
  C = c("M", "M", "F", "F", NA, "F", NA, "M", "M"), 
  D = c(1, 1, 0, 0, NA, 1, NA, 1, 1), 
  E = c("2015-02-16", "2017-10-05", "2014-03-26", "2016-09-07", "2018-10-08", 
 "2015-02-17", "2017-10-06", "2014-03-27", "2016-09-08"))



fix_nas <- function(x){
  x[complete.cases(x)][1]

}

df %>% 
  group_by(ID) %>% 
  mutate_at(2:5, fix_nas)

R: Replace missing values in one row based on unique ID for all variables in dataframe

Answers (2)

Related Questions