Reputation: 49
I have a data.frame that looks like this:
GROUP | YEAR | VAL
A | 2007 | 10
A | 2007 | 11
A | 2007 | NA
A | 2008 | 13
B | 2006 | NA
B | 2006 | 5
B | 2006 | 6
So each group may have different years. I want to replace those NAs with the mean of the respective group in the respective year. For example, for the NA in row 3, it will be replaced by the mean of group A in year 2007.
I can do this using a for loop, but unfortunately my professor has this hate for "for" loop, so I'm trying to find another way. I tried using a function like this:
imputeMean(group,year)
, it takes the group and year to calculate the mean, then mutate the data.frame. I then apply this function on a data.frame of group and year to be replaced.
Unfortunately, R does not have pass-by-reference, which means I can't modify the original data.frame directly in the imputeMean()
function. Is there anyway to calculate filter a data.frame, calculate the group mean with respect to year, and replace the NA value with this mean, without the use of loop?
Upvotes: 1
Views: 48
Reputation: 28945
Another dplyr
solution:
library(dplyr)
df1 %>%
group_by(GROUP, YEAR) %>%
mutate_at(vars(VAL) , list(~ifelse(is.na(.), mean(., na.rm = TRUE),.)))
# GROUP YEAR VAL
# 1 A 2007 10
# 2 A 2007 11
# 3 A 2007 10.5
# 4 A 2008 13
# 5 B 2006 5.5
# 6 B 2006 5
# 7 B 2006 6
Upvotes: 1
Reputation: 887163
We can use na.aggregate
after grouping by 'GROUP', 'YEAR'
library(dplyr)
library(zoo)
df1 %>%
group_by(GROUP, YEAR) %>%
mutate(VAL = na.aggregate(VAL))
Upvotes: 1