TheKrab
TheKrab

Reputation: 21

Some Na values and not all

Assume that you have a dataset like starwars. Also assume that you have 2 columns one numeric with 20 NA values and the other with species (human,Droid,machine,etc).

How to convert using pipes, only the na values that belong to category humans to the mean of the heights?

If we convert it to the total it will be wrong as machines may be a lit smaller or higher and as a result we will have some strange values as for the height of the humans.

P.s. I know how to do it using replace or ifelse, but how to add the categorization

Upvotes: 1

Views: 98

Answers (2)

benimwolfspelz
benimwolfspelz

Reputation: 690

If I understand you correctly, you just want to replace NAs by group means?

This should do:

data(starwars)

head(starwars)

#This shows one missing value (NAs) for "Droid"
starwars %>%
  group_by(species) %>%
  summarize(M = mean(height, na.rm=T),
            NAs = sum(is.na(height)))

#Replace NAs by group-wise means
starwars <- starwars %>%
  group_by(species) %>%
  mutate(height = if_else(is.na(height), mean(height, na.rm=T), as.double(height) )) %>%
  ungroup()

#Now no missing value any more and means (M) remains the same
starwars %>%
  group_by(species) %>%
  summarize(M = mean(height, na.rm=T),
            NAs = sum(is.na(height)))

Upvotes: 2

Edo
Edo

Reputation: 7818

In the starwars scenario, you can do the following

library(dplyr)

starwars %>% 
  group_by(species) %>% 
  mutate(height = if_else(species == "Human" & is.na(height), mean(height, na.rm = TRUE), as.double(height))) %>% 
  ungroup()

As you can see from here, height is filled with the average only with Human as species

library(dplyr)

starwars %>% 
  group_by(species) %>% 
  mutate(newheight = if_else(species == "Human" & is.na(height), mean(height, na.rm = TRUE), as.double(height))) %>% 
  ungroup() %>% 
  select(species, height, newheight) %>% 
  filter(is.na(height))

#> # A tibble: 6 x 3
#>   species height newheight
#>   <chr>    <int>     <dbl>
#> 1 Human       NA      177.
#> 2 Human       NA      177.
#> 3 Human       NA      177.
#> 4 Human       NA      177.
#> 5 Droid       NA       NA 
#> 6 NA          NA       NA 

In this specific example, you need to transform height into a double because it's an integer, and, since if_else is type-consistent and from the mean you receive a double, you need to transform height accordingly.

Upvotes: 2

Related Questions