Reputation: 21
Assume that you have a dataset like starwars. Also assume that you have 2 columns one numeric with 20 NA values and the other with species (human,Droid,machine,etc).
How to convert using pipes, only the na values that belong to category humans to the mean of the heights?
If we convert it to the total it will be wrong as machines may be a lit smaller or higher and as a result we will have some strange values as for the height of the humans.
P.s. I know how to do it using replace or ifelse, but how to add the categorization
Upvotes: 1
Views: 98
Reputation: 690
If I understand you correctly, you just want to replace NAs by group means?
This should do:
data(starwars)
head(starwars)
#This shows one missing value (NAs) for "Droid"
starwars %>%
group_by(species) %>%
summarize(M = mean(height, na.rm=T),
NAs = sum(is.na(height)))
#Replace NAs by group-wise means
starwars <- starwars %>%
group_by(species) %>%
mutate(height = if_else(is.na(height), mean(height, na.rm=T), as.double(height) )) %>%
ungroup()
#Now no missing value any more and means (M) remains the same
starwars %>%
group_by(species) %>%
summarize(M = mean(height, na.rm=T),
NAs = sum(is.na(height)))
Upvotes: 2
Reputation: 7818
In the starwars scenario, you can do the following
library(dplyr)
starwars %>%
group_by(species) %>%
mutate(height = if_else(species == "Human" & is.na(height), mean(height, na.rm = TRUE), as.double(height))) %>%
ungroup()
As you can see from here, height
is filled with the average only with Human as species
library(dplyr)
starwars %>%
group_by(species) %>%
mutate(newheight = if_else(species == "Human" & is.na(height), mean(height, na.rm = TRUE), as.double(height))) %>%
ungroup() %>%
select(species, height, newheight) %>%
filter(is.na(height))
#> # A tibble: 6 x 3
#> species height newheight
#> <chr> <int> <dbl>
#> 1 Human NA 177.
#> 2 Human NA 177.
#> 3 Human NA 177.
#> 4 Human NA 177.
#> 5 Droid NA NA
#> 6 NA NA NA
In this specific example, you need to transform height
into a double
because it's an integer
, and, since if_else
is type-consistent and from the mean
you receive a double
, you need to transform height
accordingly.
Upvotes: 2