Reputation: 369
I'd like to have a function like scale_color_sex
which assigns predefined colors to the categories of the color
variable. I have stored these predefined values in a named character vector, which I provide in a modified scale_color_manual
(see in code).
Ideally, a user of my custom function scale_color_sex
provides only some data (here starwars
) and the name of the sex category (here sex
) and scale_color_sex
assigns the right color to geom_point
. The code below produces the desired result.
But: I'd like to drop colors from the legend, which are not represented in the data. In this example it is the "NotInData" category in "red", which I don't want to see in plot. How can I achieve this dynamically?
Bonus points: Can I use some kind of regular expressions on the left-hand side of my color palette / named character?
Any advice, also on other ways to construct a color palette based on the values of sex
is highly welcome!
library(tidyverse)
scale_color_sex <- function(...){
scale_color_manual(
...,
values = c(
female = "#9986A5",
hermaphroditic = "#79402E",
male = "#CCBA72",
none = "#0F0D0E",
NotInData = "red"
)
)
}
starwars %>%
ggplot(aes(x = height, y = birth_year, color = sex)) +
geom_point() +
scale_color_sex()
#> Warning: Removed 44 rows containing missing values (geom_point).
Upvotes: 1
Views: 576
Reputation: 10627
Take a look at the drop
argument of scale_color_manual
:
library(tidyverse)
scale_color_sex <- function(..., drop = FALSE){
scale_color_manual(
...,
drop = drop,
limits = force,
values = c(
female = "#9986A5",
hermaphroditic = "#79402E",
male = "#CCBA72",
none = "#0F0D0E",
NotInData = "red"
)
)
}
starwars %>%
ggplot(aes(x = height, y = birth_year, color = sex)) +
geom_point() +
scale_color_sex()
#> Warning: Removed 45 rows containing missing values (geom_point).
Created on 2021-09-10 by the reprex package (v2.0.1)
Upvotes: 2
Reputation: 160447
Just because you cannot see red dots does not mean they should be dropped.
Minor point: in my version of starwars
, I have no NotInData
, but I do have NA
. I'll reassign those for this discussion.
starwars$sex[is.na(starwars$sex)] <- "NotInData"
First, most of the rows with "NotInData"
in sex have no height/birth_year to plot:
library(dplyr)
filter(starwars, sex == "NotInData")
# # A tibble: 4 x 14
# name height mass hair_color skin_color eye_color birth_year sex gender homeworld species films vehicles starships
# <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr> <chr> <chr> <list> <list> <list>
# 1 Ric Olié 183 NA brown fair blue NA NotInData <NA> Naboo <NA> <chr [1]> <chr [0]> <chr [1]>
# 2 Quarsh Panaka 183 NA black dark brown 62 NotInData <NA> Naboo <NA> <chr [1]> <chr [0]> <chr [0]>
# 3 Sly Moore 178 48 none pale white NA NotInData <NA> Umbara <NA> <chr [2]> <chr [0]> <chr [0]>
# 4 Captain Phasma NA NA unknown unknown unknown NA NotInData <NA> <NA> <NA> <chr [1]> <chr [0]> <chr [0]>
There is one, let's look at it more closely:
filter(starwars, between(height, 180, 190), between(birth_year, 60, 70))
# # A tibble: 3 x 14
# name height mass hair_color skin_color eye_color birth_year sex gender homeworld species films vehicles starships
# <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr> <chr> <chr> <list> <list> <list>
# 1 Wilhuff Tarkin 180 NA auburn, grey fair blue 64 male masculine Eriadu Human <chr [2]> <chr [0]> <chr [0]>
# 2 Quarsh Panaka 183 NA black dark brown 62 NotInData <NA> Naboo <NA> <chr [1]> <chr [0]> <chr [0]>
# 3 Jango Fett 183 79 black tan brown 66 male masculine Concord Dawn Human <chr [1]> <chr [0]> <chr [0]>
The middle row is fairly close in the other two values, so it is likely being masked in your plot. Let's plot that data.
filter(starwars, between(height, 150, 200), between(birth_year, 50, 100)) %>%
ggplot(aes(x = height, y = birth_year, color = sex)) +
geom_point() +
scale_color_sex()
showing the red "NotInData"
dot in the middle. (If you were to plot the whole data on a larger scale/resolution, you might see it break out.)
If you don't want to plot them, however, best method is to filter it out before sending to ggplot
, and by default it will be removed from the legend.
filter(starwars, sex != "NotInData") %>%
ggplot(aes(x = height, y = birth_year, color = sex)) +
geom_point() +
scale_color_sex()
Upvotes: 0