Mauricio Calvao
Mauricio Calvao

Reputation: 475

When does the argument go inside or outside aes()?

I am following Chapter 1 of Wickham and Grolemund's "R for data science" on visualization.

I have tried:

 ggplot(data = mpg) + geom_point(mapping = aes(x = displ, y = hwy, color = "blue"))

hoping to achieve a plot with all points colored blue, but instead, to my surprise, they were all red! Reading the correct code to achieve the blue points, in page 11 of the printed version or in Section 3.3 of the online version, I found it should be

ggplot(data = mpg) + geom_point(mapping = aes(x = displ, y = hwy), color = "blue")

and, in fact, they state that, to manually set an aesthetic you have to give it outside the aes() function, but inside the corresponding geom, geom_point() here. Why is it so? What is the exact explanation for this behavior? In fact, it seemed natural to me that the correct syntax would be the one of the first command.I guess this issue is related either to layers and/or to scope of variables, but I just could not get the hang of it... Can someone spoon feed me?

Edit: Sorry for not doing my correct homework: this is just Exercise 1 proposed in the text itself at the end of the corresponding Section... The answer however still escapes me.

Upvotes: 7

Views: 3939

Answers (3)

HM_ft
HM_ft

Reputation: 156

This is quite an old post, but I was stuck with the same problem for hours, and this discussion helped me to make things more clear. So here I go with a short answer.

Using the your first line of code (where color goes inside aes()), will not apply any coloring to your plot.

ggplot(data = mpg) + geom_point(mapping = aes(x = displ, y = hwy, color = "blue"))

enter image description here Why not? If you check what's going inside aes(), you find displ (your x variable), and hwy (your y variable). How does "blue" fit in here? It actually doesn't. As "blue" (a string) doesn't exist in your dataframe, it's not applied to your plot as a new coloring aesthetic. Instead, it will only be added to your legend (here "blue" could have been any string).

In your second line of code, color goes outside aes(), and as you see, it works. In this case, with one colour only, you don't need to show a legend.

 ggplot(data = mpg) + geom_point(mapping = aes(x = displ, y = hwy), color = "blue")

enter image description here

In case you want to control the specific colors of your color aesthetic when used to a third variable (drv in this case), you should use scale_fill_manual().

ggplot(data = mpg) + geom_point(mapping = aes(x = displ, y = hwy, color = drv))+ 
+scale_color_manual(values=c("green", "yellow", "red"))                   

enter image description here

Upvotes: 1

Sarah
Sarah

Reputation: 3499

I remember how completely confused I was by this when I started using ggplot.

To build on @Mauicio Calvao's answer, use color inside the aes to break up the colours in the plot by a variable of data.frame you are plotting eg:

ggplot(data = mpg) + geom_point(mapping = aes(x = displ, y = hwy, color = drv))

So when color (or size or linetype or similar things) is inside the aes it's really asking by what object\variable should the colour groups be determined. If this is a string (eg "blue") then they are all given the one group, but the name of that group isn't related to the actual colour.

To assign colours once grouped by color inside the aes you use scale_color_manual

ggplot(data = mpg) + geom_point(mapping = aes(x = displ, y = hwy, color = drv))+
    scale_colour_manual(values = c("black","blue","orange"))

Upvotes: 2

Mauricio Calvao
Mauricio Calvao

Reputation: 475

This issue and more specifically the difference in the output from the two mentioned commands are explicitly dealt with in Section 5.4.2 of the 2nd edition of "ggplot2. Elegant graphics for data analysis", by Hadley Wickham himself:

Either:

  • you can map (inside aes) a variable of your data to an aesthetic, e.g., aes(..., color = VarX), or ...
  • you can set (outside aes, but inside a geom element) an aesthetic to a constant value e.g. "blue"

In the first case, of mapping an aesthetic, such as color, ggplot2 chooses a color based on a kind of uniform average of all available colors (at the colorwheel), because the values of the mapped variable are all constant; why should the chosen color coincide with the constant value you happend to choose to map from? More explicitly, if you try the command:

ggplot(data = mpg) + geom_point(mapping = aes(x = displ, y =hwy, color = "foo"))

you get exactly the same output plot as in the first command of the original question.

Upvotes: 9

Related Questions