Plotting colors by value of a variable not being plotted in {ggplot2}

I am trying to code a plot using the data frame 'swiss' from {datasets} using {ggplot2}. I am plotting Infant.Mortality on the x-axis and Fertility on the y-axis, and I want the points to be colored such that they are a transparent blue or orange depending on if they are above or below the median value for Education. However, when I plot, I only get transparent blue points and the legend titles are off.

This is the code I have to far:

swiss$color[swiss$Education >= median(swiss$Education)] <- tBlue 
swiss$color[swiss$Education < median(swiss$Education)] <- tOrange

ggplot(data = swiss) + 
 geom_point(mapping = aes(x = Infant.Mortality, y = Fertility, color = color)) + 
 scale_color_manual(values = swiss$color,
                    labels = ">= median", "<median")

I've also tried what was explained in this question (ggplot geom_point() with colors based on specific, discrete values) but I couldn't get it to work.

I am very new to ggplot, so any advice is appreciated!! output

Upvotes: 1

Views: 308

Answers (1)

Gregor Thomas
Gregor Thomas

Reputation: 146144

With ggplot we don't normally create column of color names (this is common in base graphics). Instead, the usual way is to create a column in your data with meaningful labels, like this:

swiss$edu_med = ifelse(swiss$Education >= median(swiss$Education), ">= Median", "< Median")

ggplot(data = swiss) + 
 geom_point(mapping = aes(x = Infant.Mortality, y = Fertility, color = edu_med)) + 
 scale_color_manual(values = c(tblue, torange))

The legend labels will be automatically generated from the data values.

It is possible to do it the way you have in the question, in this case use scale_color_identity(labels = ">= median", "< median") instead of scale_color_manual().

Upvotes: 2

Related Questions