Reputation: 1
I need to fill each of NAs up in a dataframe with 2-3 groups using the median or mode values in R.
Actually, I was trying to impute NA into group by median for numerical variables and group by mode for factor variables.
I searched the site but could not find any appropriate suggestions to help me.
Some of the answers suggested to impute whole NA or only a variable one at a time. My data frame has more than 40 columns.
If anybody can solve it perspicuously, I would be very grateful.
Here's my rough code, which is not working though.
fillna_cols <- c(d,e,f,g,h...)
df %>%
group_by(a,b,c) %>%
mutate_at(fillna_cols, na.aggregate(df,FUN = median))
Upvotes: 0
Views: 227
Reputation: 2770
Fabricating some data
mtcars[ c(4,5,9) , "wt" ] <- NA
Take a look
head( mtcars)
Over write missings with the mean
mtcars[ is.na( mtcars$wt) , "wt"] <- mean( mtcars$wt , na.rm=T)
Or the median by a group
mtcars[ is.na( mtcars$wt) &mtcars$am %in%0 , "wt"] <- quantile( mtcars[ mtcars$am%in%0 , "wt"] , .5, na.rm=T)
mtcars[ is.na( mtcars$wt) &mtcars$am %in%1 , "wt"] <- quantile( mtcars[ mtcars$am%in%1 , "wt"] , .5, na.rm=T)
Or a data table solution
library( data.table)
mtcars <- data.table( mtcars)
#median within cyl/am cells
mtcars[ , median := quantile( wt , .5 , na.rm=T) , by= .(cyl, am)]
mtcars[ , impwt := ifelse( is.na( wt) , median , wt) ]
Upvotes: 2