zerocool
zerocool

Reputation: 369

Drop colors from legend which are not in data

I'd like to have a function like scale_color_sex which assigns predefined colors to the categories of the color variable. I have stored these predefined values in a named character vector, which I provide in a modified scale_color_manual (see in code).

Ideally, a user of my custom function scale_color_sex provides only some data (here starwars) and the name of the sex category (here sex) and scale_color_sex assigns the right color to geom_point. The code below produces the desired result.

But: I'd like to drop colors from the legend, which are not represented in the data. In this example it is the "NotInData" category in "red", which I don't want to see in plot. How can I achieve this dynamically?

Bonus points: Can I use some kind of regular expressions on the left-hand side of my color palette / named character?

Any advice, also on other ways to construct a color palette based on the values of sex is highly welcome!

library(tidyverse)

scale_color_sex <- function(...){
  scale_color_manual(
    ...,
    values = c(
      female = "#9986A5", 
      hermaphroditic = "#79402E", 
      male = "#CCBA72",
      none = "#0F0D0E", 
      NotInData = "red"
    )
  )
}

starwars %>% 
  ggplot(aes(x = height, y = birth_year, color = sex)) +
  geom_point() + 
  scale_color_sex()
#> Warning: Removed 44 rows containing missing values (geom_point).

Upvotes: 1

Views: 576

Answers (2)

danlooo
danlooo

Reputation: 10627

Take a look at the drop argument of scale_color_manual:

library(tidyverse)

scale_color_sex <- function(..., drop = FALSE){
  scale_color_manual(
    ...,
    drop = drop,
    limits = force,
    values = c(
      female = "#9986A5", 
      hermaphroditic = "#79402E", 
      male = "#CCBA72",
      none = "#0F0D0E", 
      NotInData = "red"
    )
  )
}

starwars %>% 
  ggplot(aes(x = height, y = birth_year, color = sex)) +
  geom_point() + 
  scale_color_sex()
#> Warning: Removed 45 rows containing missing values (geom_point).

Created on 2021-09-10 by the reprex package (v2.0.1)

Upvotes: 2

r2evans
r2evans

Reputation: 160447

Just because you cannot see red dots does not mean they should be dropped.

Minor point: in my version of starwars, I have no NotInData, but I do have NA. I'll reassign those for this discussion.

starwars$sex[is.na(starwars$sex)] <- "NotInData"

First, most of the rows with "NotInData" in sex have no height/birth_year to plot:

library(dplyr)
filter(starwars, sex == "NotInData")
# # A tibble: 4 x 14
#   name           height  mass hair_color skin_color eye_color birth_year sex       gender homeworld species films     vehicles  starships
#   <chr>           <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr>     <chr>  <chr>     <chr>   <list>    <list>    <list>   
# 1 Ric Olié          183    NA brown      fair       blue              NA NotInData <NA>   Naboo     <NA>    <chr [1]> <chr [0]> <chr [1]>
# 2 Quarsh Panaka     183    NA black      dark       brown             62 NotInData <NA>   Naboo     <NA>    <chr [1]> <chr [0]> <chr [0]>
# 3 Sly Moore         178    48 none       pale       white             NA NotInData <NA>   Umbara    <NA>    <chr [2]> <chr [0]> <chr [0]>
# 4 Captain Phasma     NA    NA unknown    unknown    unknown           NA NotInData <NA>   <NA>      <NA>    <chr [1]> <chr [0]> <chr [0]>

There is one, let's look at it more closely:

filter(starwars, between(height, 180, 190), between(birth_year, 60, 70))
# # A tibble: 3 x 14
#   name           height  mass hair_color   skin_color eye_color birth_year sex       gender    homeworld    species films     vehicles  starships
#   <chr>           <int> <dbl> <chr>        <chr>      <chr>          <dbl> <chr>     <chr>     <chr>        <chr>   <list>    <list>    <list>   
# 1 Wilhuff Tarkin    180    NA auburn, grey fair       blue              64 male      masculine Eriadu       Human   <chr [2]> <chr [0]> <chr [0]>
# 2 Quarsh Panaka     183    NA black        dark       brown             62 NotInData <NA>      Naboo        <NA>    <chr [1]> <chr [0]> <chr [0]>
# 3 Jango Fett        183    79 black        tan        brown             66 male      masculine Concord Dawn Human   <chr [1]> <chr [0]> <chr [0]>

The middle row is fairly close in the other two values, so it is likely being masked in your plot. Let's plot that data.

filter(starwars, between(height, 150, 200), between(birth_year, 50, 100)) %>% 
  ggplot(aes(x = height, y = birth_year, color = sex)) +
  geom_point() + 
  scale_color_sex()

ggplot with red dot showing

showing the red "NotInData" dot in the middle. (If you were to plot the whole data on a larger scale/resolution, you might see it break out.)

If you don't want to plot them, however, best method is to filter it out before sending to ggplot, and by default it will be removed from the legend.

filter(starwars, sex != "NotInData") %>% 
  ggplot(aes(x = height, y = birth_year, color = sex)) +
  geom_point() + 
  scale_color_sex()

ggplot with "NotInData" removed from legend

Upvotes: 0

Related Questions