Thomas
Thomas

Reputation: 2534

ggplot2: How to get symbols to stay the same for factors in two different scatterplots?

This problem has been driving me crazy. I am trying to make scatterplots with two different datasets. My dataframe is

structure(list(x1 = c(5L, 3L, 4L, 5L, 4L, 8L, 5L, 6L, 3L, 4L, 
5L, 6L, 8L, 4L), y1 = c(7L, 5L, 6L, 4L, 1L, 5L, 6L, 9L, 8L, 4L, 
5L, 6L, 7L, 8L), class1 = structure(c(1L, 2L, 1L, 1L, 2L, 2L, 
2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L), .Label = c("A", "B"), class = "factor"), 
x2 = c(4L, 8L, 7L, 5L, 6L, 2L, 5L, 4L, 5L, NA, NA, NA, NA, 
NA), y2 = c(7L, 5L, 1L, 4L, 5L, 8L, 4L, 5L, 8L, NA, NA, NA, 
NA, NA), class2 = structure(c(3L, 2L, 2L, 2L, 3L, 2L, 2L, 
3L, 3L, 1L, 1L, 1L, 1L, 1L), .Label = c("", "A", "B"), class = "factor")), .Names = c("x1", 
"y1", "class1", "x2", "y2", "class2"), class = "data.frame", row.names = c(NA, 
-14L))

and looks like this:

x1  y1  class1  x2  y2  class2
5   7   A       4   7   B
3   5   B       8   5   A
4   6   A       7   1   A
5   4   A       5   4   A
4   1   B       6   5   B
8   5   B       2   8   A
5   6   B       5   4   A
6   9   B       4   5   B
3   8   B       5   8   B
4   4   A
5   5   A
6   6   A
8   7   A
4   8   A

I want to plot two scatterplots:

  1. x1 vs y1
  2. x2 vs y2

In each scatterplot, I want the symbol shape to be determined by the classes class1 and class2. Since the classes are either A or B, I want the symbol shape to stay the same in both plots.

I am using the following code to try and do this:

library(ggplot2)
theme_set(theme_bw()) # omit grey background

qplot(x1, y1, data=df, shape=I(21), fill=I("gray"), size = I(4),alpha = I(0))+
  stat_smooth(method="lm", se=FALSE, colour="black", size=1) + geom_point(shape=factor(class1),        size=I(4))

qplot(x2, y2, data=df, shape=I(21), fill=I("gray"), size = I(4),alpha = I(0))+
  stat_smooth(method="lm", se=FALSE, colour="black", size=1) + geom_point(shape=factor(class2),     size=I(4))

It works fine if the length of my x1/y1 and x2/y2 are the same - in that case the symbols stay the same in both plots. However, if the length of the datasets are different (like in the dataframe example above), then a third symbol is introduced into the second plot.

Does anyone know how I can get the same symbols for A and B in both plots?

EDIT: If I try the method suggested below by Didzis Elferts

ggplot(df,aes(x1,y1,shape=class1))+geom_point(size=4)+
scale_shape_manual(breaks=c("A","B"),values=c(15,16))

ggplot(df,aes(x2,y2,shape=class2))+geom_point(size=4)+
scale_shape_manual(breaks=c("A","B"),values=c(15,16))

I get this error:

Error: Insufficient values in manual scale. 3 needed but only 2 provided.

EDIT 2: Didzis Elferts recommended the following solution

df$class2<-factor(df$class2,levels=c("A","B"))

However, when I try to add a regression line to each scatterplot using

ggplot(df,aes(x1,y1,shape=class1))+geom_point(size=4)+ scale_shape_manual(breaks=c("A","B"),values=c(15,16))+ stat_smooth(method="lm", se=FALSE, colour="black", size=1)

qplot(x2, y2, data=df, shape=class1)+ stat_smooth(method="lm", se=FALSE, colour="black", size=1) + geom_point(size=4)+ scale_shape_manual(breaks=c("A","B"),values=c(15,16))

ggplot2 adds a separate regression line for each class. Instead, I need just a single regression line based on the data from both classes together (even though they have different symbols).

Upvotes: 4

Views: 1549

Answers (1)

Didzis Elferts
Didzis Elferts

Reputation: 98429

Problem is that in your data for class2 empty cell is one of the factor levels.

str(df$class2)
 Factor w/ 3 levels "","A","B": 3 2 2 2 3 2 2 3 3 1 ...

You can change this empty cell to NA by setting new factor levels.

df$class2<-factor(df$class2,levels=c("A","B"))

One way to ensure that both plots have the same legend and symbols is to use scale_shape_manual() and then set breaks= and values= (shape symbols) you need.

ggplot(df,aes(x1,y1,shape=class1))+geom_point(size=4)+
    scale_shape_manual(breaks=c("A","B"),values=c(15,16))

ggplot(df,aes(x2,y2,shape=class2))+geom_point(size=4)+
    scale_shape_manual(breaks=c("A","B"),values=c(15,16))

Update - one regression line

To get only one regression line shape= should be placed directly inside aes() of geom_point().

ggplot(df,aes(x2,y2))+geom_point(size=4,aes(shape=class2))+
           scale_shape_manual(breaks=c("A","B"),values=c(15,16))+
           stat_smooth(method="lm", se=FALSE, colour="black", size=1)

Upvotes: 5

Related Questions