grouping and summarizing to find mean and median based on Group

I have a data-set comprising of model,Brand and Engine in the engine column it is having 29 Missing values I wanted to fill those missing values based on the median or mean of that particular model How can i do it ?

I have used this syntax:

usedcars  %>% 
  group_by(Brand,Model) %>%
  filter(is.na(Engine)) %>%
  summarise(Engine)


Groups:   Brand, Model [210]
   Brand Model     Engine
   <fct> <fct>      <dbl>
 1 Audi  A3-series   1968
 2 Audi  A3-series   1968
 3 Audi  A3-series   1968
 4 Audi  A4          1968
 5 Audi  A4          1968
 6 Audi  A4          1968
 7 Audi  A4          1968
 8 Audi  A4          2967
 9 Audi  A4          1968
10 Audi  A4          1968

Upvotes: 1

Views: 61

Answers (3)

TarJae
TarJae

Reputation: 79204

We could use ifelse: here example with mean. you could use median if preferred:

library(dplyr)
df <- usedcars %>% 
  group_by(Brand, Model) %>% 
  mutate(Engine_Mean = mean(Engine, na.rm = TRUE)) %>% 
  mutate(Engine = ifelse(is.na(Engine), Engine_Mean, Engine), .keep="unused")

Output:

   Brand Model     Engine
   <chr> <chr>      <dbl>
 1 Audi  A3-series  1968 
 2 Audi  A3-series  1968 
 3 Audi  A3-series  1968 
 4 Audi  A4         1968 
 5 Audi  A4         1968 
 6 Audi  A4         1968 
 7 Audi  A4         1968 
 8 Audi  A4         2967 
 9 Audi  A4         1968 
10 Audi  A4         2093.
11 Audi  A3-series  1968 
12 Audi  A3-series  1968 
13 Audi  A3-series  1968 
14 Audi  A4         2093.
15 Audi  A4         2093.
16 Audi  A4         2093.
17 Audi  A4         2093.
18 Audi  A4         2093.
19 Audi  A4         1968 
20 Audi  A4         1968 

Example data:

usedcars <- tribble(
~Brand, ~Model,     ~Engine,
"Audi",  "A3-series",   1968,
"Audi",  "A3-series",   1968,
"Audi",  "A3-series",   1968,
"Audi",  "A4",          1968,
"Audi",  "A4",          1968,
"Audi",  "A4",          1968,
"Audi",  "A4",          1968,
"Audi",  "A4",          2967,
"Audi",  "A4",          1968,
"Audi",  "A4",          NA,
"Audi",  "A3-series",   NA,
"Audi",  "A3-series",   NA,
"Audi",  "A3-series",   NA,
"Audi",  "A4",          NA,
"Audi",  "A4",          NA,
"Audi",  "A4",          NA,
"Audi",  "A4",          NA,
"Audi",  "A4",          NA,
"Audi",  "A4",          1968,
"Audi",  "A4",          1968)

Upvotes: 2

akrun
akrun

Reputation: 887851

We could use na.aggregate from zoo (by default, it replaces the missing values with mean)

library(dplyr)
library(zoo)
usedcars <- usedcars %>%
       group_by(Brand, Model) %>%
       mutate(Engine_mean = na.aggregate(Engine),
              Engine_median = na.aggregate(Engine, FUN = median))

Upvotes: 3

Ronak Shah
Ronak Shah

Reputation: 389235

You can use -

library(dplyr)

usedcars%>%
  group_by(Brand,Model)%>%
         #replace NA with mean
  mutate(Engine_mean = replace(Engine, is.na(Engine), mean(Engine, na.rm = TRUE)), 
         #Replace NA with median
         Engine_median = replace(Engine, is.na(Engine), median(Engine, na.rm = TRUE)))

Upvotes: 3

Related Questions