Reputation: 4301
I have a data frame of the form:
Weight Day Hour
NA M 0
NA M 1
2 M 2
1 M 3
4 T 0
5 T 1
NA T 2
2 T 3
3 W 0
3 W 1
1 W 2
NA W 3
For a given NA value in Weight, I want to replace it with the average of the non-NA values having the same value for Hour. For example, the first value in Weight is NA. Its Hour value is 0, so I want to average the other Weights where Hour is 0 (those values being 4 and 3). I then want to replace the NA with the computed average (3.5).
As an R beginner, I'd like to see a clear, multistep process for this. (I'm posing this as a learning exercise rather than a specific "solve this problem" type question. I'm not interested in who can do it in the fewest characters.)
Upvotes: 1
Views: 235
Reputation: 92300
Here's a dplyr
solution. It is both very fast and easy to understand (because of it's piped structure), thus could be good start for a beginner. Assuming df
is your data set
library(dplyr)
df %>% # Select your data set
group_by(Hour) %>% # Group by Hour
mutate(Weight = ifelse(is.na(Weight),
mean(Weight, na.rm = TRUE),
Weight)) # Replace all NAs with the mean
Upvotes: 4
Reputation: 887901
You could also use data.table
library(data.table)
setDT(dat)[, list(Weight=replace(Weight, is.na(Weight),
mean(Weight, na.rm=TRUE))),by=Hour]
Or
setDT(dat)[, Weight1:=mean(Weight, na.rm=TRUE), by=Hour][,
Weight:=ifelse(is.na(Weight), Weight1, Weight)][, Weight1:=NULL]
Upvotes: 4
Reputation: 121608
You can use ave
for such operations.
dat$Weight <-
ave(dat$Weight,dat$Hour,FUN=function(x){
mm <- mean(x,na.rm=TRUE)
ifelse(is.na(x),mm,x)
})
Upvotes: 4