Reputation: 4407
My question is I want to change all the missing values to the mean of each group for multiple columns. I want to use dplyr
but it does not work for me
For example
iris2 <- iris
set.seed(1)
iris2[-5] <- lapply(iris2[-5], function(x) {
x[sample(length(x), sample(10, 1))] <- NA
x
})
impute_missing=function(x){
x[is.na(x)]=mean(x,na.rm=TRUE)
return(x)
}
iris2 %>% groupby (Species) %>% sapply(impute_missing)
However the codes did not impute the missing by Species but by the mean of all the non-missing values of each column. Another weird thin is the function was also applied to Species
the group variable. Is there any way to impute the mean by species and keep a complete dataframe/
Upvotes: 4
Views: 4482
Reputation: 887148
Try:
library(dplyr)
iris2New <- iris2 %>%
group_by(Species) %>%
mutate_each(funs(mean=mean(., na.rm=TRUE)), contains("."))
iris2[,-5][is.na(iris2)[,-5]] <- iris2New[,-5][is.na(iris2)[,-5]]
iris2
Or, you could use ifelse
on the initial dataset iris2
fun1 <- function(x) ifelse(is.na(x), mean(x, na.rm=TRUE), x)
iris3 <- iris2 %>%
group_by(Species) %>%
mutate_each(funs(fun1), contains(".") )
identical(as.data.frame(iris3), iris2)
#[1] TRUE
Or, instead of a function
, you can use
iris4 <- iris2 %>%
group_by(Species) %>%
mutate_each(funs(ifelse(is.na(.), mean(., na.rm=TRUE), .)), contains(".") )
identical(iris3,iris4)
#[1] TRUE
Upvotes: 4