Reputation: 41
I have a data-set comprising of model,Brand and Engine in the engine column it is having 29 Missing values I wanted to fill those missing values based on the median or mean of that particular model How can i do it ?
I have used this syntax:
usedcars %>%
group_by(Brand,Model) %>%
filter(is.na(Engine)) %>%
summarise(Engine)
Groups: Brand, Model [210]
Brand Model Engine
<fct> <fct> <dbl>
1 Audi A3-series 1968
2 Audi A3-series 1968
3 Audi A3-series 1968
4 Audi A4 1968
5 Audi A4 1968
6 Audi A4 1968
7 Audi A4 1968
8 Audi A4 2967
9 Audi A4 1968
10 Audi A4 1968
Upvotes: 1
Views: 61
Reputation: 79204
We could use ifelse
: here example with mean
. you could use median
if preferred:
library(dplyr)
df <- usedcars %>%
group_by(Brand, Model) %>%
mutate(Engine_Mean = mean(Engine, na.rm = TRUE)) %>%
mutate(Engine = ifelse(is.na(Engine), Engine_Mean, Engine), .keep="unused")
Output:
Brand Model Engine
<chr> <chr> <dbl>
1 Audi A3-series 1968
2 Audi A3-series 1968
3 Audi A3-series 1968
4 Audi A4 1968
5 Audi A4 1968
6 Audi A4 1968
7 Audi A4 1968
8 Audi A4 2967
9 Audi A4 1968
10 Audi A4 2093.
11 Audi A3-series 1968
12 Audi A3-series 1968
13 Audi A3-series 1968
14 Audi A4 2093.
15 Audi A4 2093.
16 Audi A4 2093.
17 Audi A4 2093.
18 Audi A4 2093.
19 Audi A4 1968
20 Audi A4 1968
Example data:
usedcars <- tribble(
~Brand, ~Model, ~Engine,
"Audi", "A3-series", 1968,
"Audi", "A3-series", 1968,
"Audi", "A3-series", 1968,
"Audi", "A4", 1968,
"Audi", "A4", 1968,
"Audi", "A4", 1968,
"Audi", "A4", 1968,
"Audi", "A4", 2967,
"Audi", "A4", 1968,
"Audi", "A4", NA,
"Audi", "A3-series", NA,
"Audi", "A3-series", NA,
"Audi", "A3-series", NA,
"Audi", "A4", NA,
"Audi", "A4", NA,
"Audi", "A4", NA,
"Audi", "A4", NA,
"Audi", "A4", NA,
"Audi", "A4", 1968,
"Audi", "A4", 1968)
Upvotes: 2
Reputation: 887851
We could use na.aggregate
from zoo
(by default, it replaces the missing values with mean
)
library(dplyr)
library(zoo)
usedcars <- usedcars %>%
group_by(Brand, Model) %>%
mutate(Engine_mean = na.aggregate(Engine),
Engine_median = na.aggregate(Engine, FUN = median))
Upvotes: 3
Reputation: 389235
You can use -
library(dplyr)
usedcars%>%
group_by(Brand,Model)%>%
#replace NA with mean
mutate(Engine_mean = replace(Engine, is.na(Engine), mean(Engine, na.rm = TRUE)),
#Replace NA with median
Engine_median = replace(Engine, is.na(Engine), median(Engine, na.rm = TRUE)))
Upvotes: 3