Reputation: 511
I'm working with a "long" dataframe in which unique participants have multiple observations over time. Some rows are missing data across multiple columns, and those missing records need to be filled with the same data for each participant.
My data set looks like:
list(
ID = c("A23", "A23", "A24", "A24", "A24", "A50", "A50", "A60", "A60"),
A = c("Y", "Y", "N", "N", NA, "Y", NA, "N", "N"),
B = c(2.4, 2.4, 2.6, 2.6, NA, 2.2, NA, 3.3, 3.3),
C = c("M", "M", "F", "F", NA, "F", NA, "M", "M"),
D = c(1, 1, 0, 0, NA, 1, NA, 1, 1),
E = c("2015-02-16", "2017-10-05", "2014-03-26", "2016-09-07", "2018-10-08", "2015-02-17", "2017-10-06", "2014-03-27", "2016-09-08"))
I would like to be able to simply copy the data from the complete/non-missing rows and paste it into the missing records based on the ID
variable, across the entire dataframe.
I have tried the following methods to no avail (no missing values replaced):
library(tidyverse)
library(zoo)
library(plyr)
# Attempt 1:
data %>%
group_by(ID) %>%
mutate_all(~replace(., is.na(.), .[!is.na(.)][1]))
# Attempt 2:
plyr::ddply(data, ~ID, na.locf)
I've found solutions for cases in which missing values of a single column need to be replaced, but I have not yet found a solution for replacing missing values across an entire dataframe.
Thanks in advance.
Upvotes: 1
Views: 826
Reputation: 155
fill()
from tidyr
should work:
df %>%
group_by(ID) %>%
fill(names(df))
# A tibble: 9 x 6
# Groups: ID [4]
ID A B C D E
<chr> <chr> <dbl> <chr> <dbl> <chr>
1 A23 Y 2.4 M 1 2015-02-16
2 A23 Y 2.4 M 1 2017-10-05
3 A24 N 2.6 F 0 2014-03-26
4 A24 N 2.6 F 0 2016-09-07
5 A24 N 2.6 F 0 2018-10-08
6 A50 Y 2.2 F 1 2015-02-17
7 A50 Y 2.2 F 1 2017-10-06
8 A60 N 3.3 M 1 2014-03-27
9 A60 N 3.3 M 1 2016-09-08
Upvotes: 1
Reputation: 3417
You can solve this using the group_by/mutate method in dplyr:
library(dplyr)
df <- tibble(
ID = c("A23", "A23", "A24", "A24", "A24", "A50", "A50", "A60", "A60"),
A = c("Y", "Y", "N", "N", NA, "Y", NA, "N", "N"),
B = c(2.4, 2.4, 2.6, 2.6, NA, 2.2, NA, 3.3, 3.3),
C = c("M", "M", "F", "F", NA, "F", NA, "M", "M"),
D = c(1, 1, 0, 0, NA, 1, NA, 1, 1),
E = c("2015-02-16", "2017-10-05", "2014-03-26", "2016-09-07", "2018-10-08",
"2015-02-17", "2017-10-06", "2014-03-27", "2016-09-08"))
fix_nas <- function(x){
x[complete.cases(x)][1]
}
df %>%
group_by(ID) %>%
mutate_at(2:5, fix_nas)
Upvotes: 1