Reputation: 3233
Consider a simplified dataset (the real one has more columns and rows):
df
tp tf weight
1 FWD RF 78.86166
2 MF LF 81.04566
3 DEF LF 80.70527
4 DEF LF 82.96071
5 DEF RF 78.42544
6 GK LF 79.37686
7 DEF RF 78.79928
8 MF RF NA
9 MF RF 78.93815
10 DEF RF 80.00284
I want to fill the missing values in weight by the grouped median of tp and tf combined
What i have tried till now is the following (I have used dlpyr)
temp <- df %>% group_by(tp,tf) %>% summarise(mvalue = median(weight,na. rm = TRUE))
This gives temp to be :
temp
Source: local data frame [6 x 3]
Groups: tp [?]
tp tf mvalue
<fctr> <fctr> <dbl>
1 DEF LF 81.83299
2 DEF RF 78.79928
3 FWD RF 78.86166
4 GK LF 79.37686
5 MF LF 81.04566
6 MF RF 78.93815
Now i am unable to figure out how to fill the missing values in df with the corresponding group median.
In my simple case there is only one NA corresponding to tp = MF and tf = RF, the median value if you look up at temp is 78.93815
How do i do this in general? Do suggest if you have a better approach than my initial one.
EDIT: The actual dataframe has a unique Id variable if that makes any difference or can help.
Upvotes: 3
Views: 2626
Reputation: 886938
We can use the na.aggregate
from zoo
and change the FUN
argument to median
. By default it would be mean
library(zoo)
library(dplyr)
df %>%
group_by(tp, tf) %>%
mutate(weight = na.aggregate(weight, FUN = median))
# tp tf weight
# <chr> <chr> <dbl>
#1 FWD RF 78.86166
#2 MF LF 81.04566
#3 DEF LF 80.70527
#4 DEF LF 82.96071
#5 DEF RF 78.42544
#6 GK LF 79.37686
#7 DEF RF 78.79928
#8 MF RF 78.93815
#9 MF RF 78.93815
#10 DEF RF 80.00284
The corresponding syntax for data.table
would be
library(data.table)
setDT(df)[, weight := na.aggregate(weight, FUN = median), by = .(tp, tf)]
Upvotes: 4
Reputation: 51582
You can try,
library(dplyr)
df %>%
group_by(tp, tf) %>%
mutate(weight = replace(weight, is.na(weight), median(weight, na.rm = TRUE)))
Upvotes: 5