Reputation: 2534
This problem has been driving me crazy. I am trying to make scatterplots with two different datasets. My dataframe is
structure(list(x1 = c(5L, 3L, 4L, 5L, 4L, 8L, 5L, 6L, 3L, 4L,
5L, 6L, 8L, 4L), y1 = c(7L, 5L, 6L, 4L, 1L, 5L, 6L, 9L, 8L, 4L,
5L, 6L, 7L, 8L), class1 = structure(c(1L, 2L, 1L, 1L, 2L, 2L,
2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L), .Label = c("A", "B"), class = "factor"),
x2 = c(4L, 8L, 7L, 5L, 6L, 2L, 5L, 4L, 5L, NA, NA, NA, NA,
NA), y2 = c(7L, 5L, 1L, 4L, 5L, 8L, 4L, 5L, 8L, NA, NA, NA,
NA, NA), class2 = structure(c(3L, 2L, 2L, 2L, 3L, 2L, 2L,
3L, 3L, 1L, 1L, 1L, 1L, 1L), .Label = c("", "A", "B"), class = "factor")), .Names = c("x1",
"y1", "class1", "x2", "y2", "class2"), class = "data.frame", row.names = c(NA,
-14L))
and looks like this:
x1 y1 class1 x2 y2 class2
5 7 A 4 7 B
3 5 B 8 5 A
4 6 A 7 1 A
5 4 A 5 4 A
4 1 B 6 5 B
8 5 B 2 8 A
5 6 B 5 4 A
6 9 B 4 5 B
3 8 B 5 8 B
4 4 A
5 5 A
6 6 A
8 7 A
4 8 A
I want to plot two scatterplots:
x1
vs y1
x2
vs y2
In each scatterplot, I want the symbol shape to be determined by the classes class1
and class2
. Since the classes are either A
or B
, I want the symbol shape to stay the same in both plots.
I am using the following code to try and do this:
library(ggplot2)
theme_set(theme_bw()) # omit grey background
qplot(x1, y1, data=df, shape=I(21), fill=I("gray"), size = I(4),alpha = I(0))+
stat_smooth(method="lm", se=FALSE, colour="black", size=1) + geom_point(shape=factor(class1), size=I(4))
qplot(x2, y2, data=df, shape=I(21), fill=I("gray"), size = I(4),alpha = I(0))+
stat_smooth(method="lm", se=FALSE, colour="black", size=1) + geom_point(shape=factor(class2), size=I(4))
It works fine if the length of my x1/y1
and x2/y2
are the same - in that case the symbols stay the same in both plots. However, if the length of the datasets are different (like in the dataframe example above), then a third symbol is introduced into the second plot.
Does anyone know how I can get the same symbols for A
and B
in both plots?
EDIT: If I try the method suggested below by Didzis Elferts
ggplot(df,aes(x1,y1,shape=class1))+geom_point(size=4)+
scale_shape_manual(breaks=c("A","B"),values=c(15,16))
ggplot(df,aes(x2,y2,shape=class2))+geom_point(size=4)+
scale_shape_manual(breaks=c("A","B"),values=c(15,16))
I get this error:
Error: Insufficient values in manual scale. 3 needed but only 2 provided.
EDIT 2: Didzis Elferts recommended the following solution
df$class2<-factor(df$class2,levels=c("A","B"))
However, when I try to add a regression line to each scatterplot using
ggplot(df,aes(x1,y1,shape=class1))+geom_point(size=4)+ scale_shape_manual(breaks=c("A","B"),values=c(15,16))+ stat_smooth(method="lm", se=FALSE, colour="black", size=1)
qplot(x2, y2, data=df, shape=class1)+ stat_smooth(method="lm", se=FALSE, colour="black", size=1) + geom_point(size=4)+ scale_shape_manual(breaks=c("A","B"),values=c(15,16))
ggplot2 adds a separate regression line for each class. Instead, I need just a single regression line based on the data from both classes together (even though they have different symbols).
Upvotes: 4
Views: 1549
Reputation: 98429
Problem is that in your data for class2
empty cell is one of the factor levels.
str(df$class2)
Factor w/ 3 levels "","A","B": 3 2 2 2 3 2 2 3 3 1 ...
You can change this empty cell to NA
by setting new factor levels.
df$class2<-factor(df$class2,levels=c("A","B"))
One way to ensure that both plots have the same legend and symbols is to use scale_shape_manual()
and then set breaks=
and values=
(shape symbols) you need.
ggplot(df,aes(x1,y1,shape=class1))+geom_point(size=4)+
scale_shape_manual(breaks=c("A","B"),values=c(15,16))
ggplot(df,aes(x2,y2,shape=class2))+geom_point(size=4)+
scale_shape_manual(breaks=c("A","B"),values=c(15,16))
To get only one regression line shape=
should be placed directly inside aes()
of geom_point()
.
ggplot(df,aes(x2,y2))+geom_point(size=4,aes(shape=class2))+
scale_shape_manual(breaks=c("A","B"),values=c(15,16))+
stat_smooth(method="lm", se=FALSE, colour="black", size=1)
Upvotes: 5