Maria Provelegiou
Maria Provelegiou

Reputation: 81

Ggplot, Last Geom_point colouring overwrites the first colouring

How can I make the colouring of points specific? As the code shows below which I used the last color overwrites the first colour and now on my legend I have data1 and data2 with the same colour, which is not what I want.

ggplot(data,aes(x,y,color,group))+
geom_point(data1,aes(fill="data1"),shape="*",size=12,color="blue")+
geom_point(data2,aes(fill="data2"),shape="*",size=12,color="red")

Just to highlight that data1 and data2 where derived by data on certain conditions

Upvotes: 0

Views: 623

Answers (1)

chemdork123
chemdork123

Reputation: 13893

First, let's talk about what's really going on in your example. Then, I'll provide two ways to solve it.

What is Really Happening in OP's example

Here's a reprex of OP's example:

library(ggplot2)

set.seed(8675309)
data1 <- data.frame(x=1:10, y=rnorm(10, 10))
data2 <- data.frame(x=sample(1:10, 10, replace=TRUE), y=rnorm(10, 11, 0.4))

ggplot(mapping= aes(x=x, y=y)) +
  geom_point(data=data1, aes(fill='data1'), shape='*', size=12, color='blue') +
  geom_point(data=data2, aes(fill='data2'), shape='*', size=12, color='red')

enter image description here

On first glance it looks like the points are colored as we wanted them to be, but the legend is colored only by the last geom_point. But... that's not actually what's going on here. In reality, the result in the legend is due to drawing a red point on top of a blue point in both legend keys. We can demonstrate this very clearly when you change the shape of the blue point:

ggplot(mapping= aes(x=x, y=y)) +
  geom_point(data=data1, aes(fill='data1'), size=12, color='blue') +
  geom_point(data=data2, aes(fill='data2'), shape='*', size=12, color='red')

enter image description here

The reason for the overplotting is simple: OP has set the fill aesthetic in aes() and then adjusted the color modifier discretely. Therefore, the legend does not reflect the difference in color, but the difference in fill. Since "*" is not a shape that has a fill, there is no difference in appearance other than the difference in color.

How to Fix

There are two ways to fix this. Both involve moving color from outside aes() to inside aes(). One way maintains the two datasets data1 and data2 as separate data frames as OP has it, where we have a geom_point call for each dataset, and the second way applies Tidy Data principles and is generally much better practice for plotting with ggplot2.

The non-Tidy Way

Move color inside aes() for both geom_point calls and remove fill, since it doesn't apply here. The result of doing this will mean that ggplot will create a legend and add "data1" and "data2" to that legend. Colors are chosen automatically, but if we want to specify the color, we can use scale_color_manual():

ggplot(mapping= aes(x=x, y=y)) +
  geom_point(data=data1, aes(color='data1'), shape='*', size=12) +
  geom_point(data=data2, aes(color='data2'), shape='*', size=12) +
  scale_color_manual(values=c('blue', 'red'))

enter image description here

By the way, if you keep color inside and outside aes(), the color outside of aes() will overwrite the one inside the aes() function. This means your points will be the right color, but no legend is drawn.

The Tidy Data Way

Again, this way is much more preferred. The idea is that you should combine your datasets into one, adding a column to differentiate the origin of the data. You then use that column to indicate how to label and color the points. You only need one call to geom_point to make this work. It may not look so much improved in this particular example, but consider what the difference would be if you had 10 datasets.

library(dplyr)
library(tidyr)

# note we add a named list to ensure the id column is correctly populated
df <- bind_rows(list(data1=data1, data2=data2), .id="id")

ggplot(df, aes(x=x, y=y, color=id)) + geom_point(shape='*', size=12) +
  scale_color_manual(values=c('blue', 'red'))

The resulting plot is identical to the other one.

EDIT: What if there already is a color aesthetic?

While not a part of the question, OP indicated that in their particular case, there was already a color aesthetic defined (so the values sent for scale_color_manual() were not sufficient. There are some options for how to proceed here:

  • In OP's case, the message indicated they needed to provide 6, not 2 values. OP can try to supply 6 colors and map them accordingly using a named vector (i.e. c("data1" = "blue", "data2" = "red", ...).
  • Use a point shape that has a fill color and use that for the separate legend
  • Use the same asterisk * point shape and color, but override the aesthetics in the legend.

Without the actual data from the OP and the code they are using specifically that includes the conflicting color aesthetic, it's difficult to suggest the best course for that particular case; however, I'll demonstrate the final two approaches here:

Use point shape with a fill color

ggplot(mapping= aes(x=x, y=y)) +
  geom_point(data=data1, aes(fill='data1'), shape=21, size=12, color='NA') +
  geom_point(data=data2, aes(fill='data2'), shape=21, size=12, color='NA') +
  scale_fill_manual(values=c('data1'='blue', 'data2'='red'))

enter image description here

Override the aesthetics of the color legend

ggplot(mapping= aes(x=x, y=y)) +
  geom_point(data=data1, aes(fill='data1'), shape='*', size=12, color='blue') +
  geom_point(data=data2, aes(fill='data2'), shape='*', size=12, color='red') +
  guides(
    fill=guide_legend(override.aes = list(color=c('blue','red')))
  )

enter image description here

Upvotes: 1

Related Questions