DeltaIV
DeltaIV

Reputation: 5646

plot two different data frames, grouped by one or more variables, with different labels in the legend

Consider the following sample data frames:

x1=seq(2,7,length.out=13)
y1a=1.5*x1+4; y2a=1.5*x1+7;y3a=1.5*x1+9;
X1=rep(x1,3)
Y1=c(y1a,y2a,y3a)
groups1=rep(c("A","B","C"),each=13)
df1=data.frame(groups1,X1,Y1)

x2=seq(4,10,length=10)
y1b=3*x2+4; y2b=3*x2+7;y3b=3*x2+9;
X2=rep(x2,3)
Y2=c(y1b,y2b,y3b)
groups2=rep(c("A","B","C"),each=10)
df2=data.frame(groups2,X2,Y2)

Plotting them with ggplot2, I get an automatic legend, which is great. However, the legend is the same for both dataframes, which is not so great:

p <- ggplot()
p <- p + geom_line(data=df1,aes(x=X1, y=Y1, color = groups1)) +
     geom_point(data=df2,aes(x=X2, y=Y2, color = groups2))

enter image description here

Instead, I would like the legend to distinguish between curves coming from df1, and curves coming from df2, labeling the former ones "Pred" and the latter "Test". How can I do that? Note that the real data frames are much larger and very different (one has ~ 400 rows x 10 columns, and the other has ~90 rows x 30 columns), thus merging them together wouldn't be simple.

Upvotes: 2

Views: 1982

Answers (2)

Jaap
Jaap

Reputation: 83255

An easy alternative is to use a different shape for the points for which you can use a fill and set the color to NA (thus getting the same result as the default point):

ggplot() + 
  geom_line(data=df1,aes(x=X1, y=Y1, color = groups1)) +
  geom_point(data=df2,aes(x=X2, y=Y2, fill = groups2), shape=21, color=NA) +
  scale_color_discrete("Pred") +
  scale_fill_discrete("Test")

which gives:

enter image description here


Another possibility is to use different linetypes for the different datasets:

ggplot() + 
  geom_line(data=df1,aes(x=X1, y=Y1, color = groups1, linetype = "Pred")) +
  geom_line(data=df2,aes(x=X2, y=Y2, color = groups2, linetype = "Test")) +
  scale_color_discrete("Groups") +
  scale_linetype_discrete("Datasets")

which gives:

enter image description here

Upvotes: 4

Heroka
Heroka

Reputation: 13149

Generally, ggplot makes one legend for one thing. You cannot have two color-legends.

You can use a different color for each main group and each subgroup by using interaction.

df1$group <- 1
df2$group <- 2


p <- ggplot() + geom_line(data=df1,aes(x=X1, y=Y1, color = interaction(group,groups1))) +
  geom_point(data=df2,aes(x=X2, y=Y2, color = interaction(group,groups2)))
p

enter image description here

Upvotes: 2

Related Questions