Eric
Eric

Reputation: 4301

Impute missing values with the average of the remainder

I have a data frame of the form:

Weight  Day     Hour
NA      M       0
NA      M       1
2       M       2
1       M       3
4       T       0
5       T       1
NA      T       2
2       T       3
3       W       0
3       W       1
1       W       2
NA      W       3

For a given NA value in Weight, I want to replace it with the average of the non-NA values having the same value for Hour. For example, the first value in Weight is NA. Its Hour value is 0, so I want to average the other Weights where Hour is 0 (those values being 4 and 3). I then want to replace the NA with the computed average (3.5).

As an R beginner, I'd like to see a clear, multistep process for this. (I'm posing this as a learning exercise rather than a specific "solve this problem" type question. I'm not interested in who can do it in the fewest characters.)

Upvotes: 1

Views: 235

Answers (3)

David Arenburg
David Arenburg

Reputation: 92300

Here's a dplyr solution. It is both very fast and easy to understand (because of it's piped structure), thus could be good start for a beginner. Assuming df is your data set

library(dplyr)
df %>% # Select your data set
  group_by(Hour) %>% # Group by Hour
  mutate(Weight = ifelse(is.na(Weight), 
                         mean(Weight, na.rm = TRUE), 
                         Weight)) # Replace all NAs with the mean

Upvotes: 4

akrun
akrun

Reputation: 887901

You could also use data.table

library(data.table)
 setDT(dat)[, list(Weight=replace(Weight, is.na(Weight),
       mean(Weight, na.rm=TRUE))),by=Hour]

Or

setDT(dat)[, Weight1:=mean(Weight, na.rm=TRUE), by=Hour][,
              Weight:=ifelse(is.na(Weight), Weight1, Weight)][, Weight1:=NULL]

Upvotes: 4

agstudy
agstudy

Reputation: 121608

You can use ave for such operations.

dat$Weight <- 
ave(dat$Weight,dat$Hour,FUN=function(x){
  mm <- mean(x,na.rm=TRUE)
  ifelse(is.na(x),mm,x)
})
  • You will apply a function by group of hours.
  • For each group you compute the mean wuthout missing values.
  • You assign the mean if the value is a missing value otherwise you keep the origin value.
  • You replace the Weight vector by the new created vector.

Upvotes: 4

Related Questions