dan
dan

Reputation: 6314

Add a legend for geom_polygon

I'm trying to produce a scatter plot with geom_point where the points are circumscribed by a smoothed polygon, with geom_polygon.

Here's my point data:

set.seed(1)
df <- data.frame(x=c(rnorm(30,-0.1,0.1),rnorm(30,0,0.1),rnorm(30,0.1,0.1)),y=c(rnorm(30,-1,0.1),rnorm(30,0,0.1),rnorm(30,1,0.1)),val=rnorm(90),cluster=c(rep(1,30),rep(2,30),rep(3,30)),stringsAsFactors=F)

I color each point according the an interval that df$val is in. Here's the interval data:

intervals.df <- data.frame(interval=c("(-3,-2]","(-2,-0.999]","(-0.999,0]","(0,1.96]","(1.96,3.91]","(3.91,5.87]","not expressed"),
                           start=c(-3,-2,-0.999,0,1.96,3.91,NA),end=c(-2,-0.999,0,1.96,3.91,5.87,NA),
                           col=c("#2f3b61","#436CE8","#E0E0FF","#7d4343","#C74747","#EBCCD6","#D3D3D3"),stringsAsFactors=F)

Assigning colors and intervals to the points:

df <- cbind(df,do.call(rbind,lapply(df$val,function(x){
  if(is.na(x)){
    return(data.frame(col=intervals.df$col[nrow(intervals.df)],interval=intervals.df$interval[nrow(intervals.df)],stringsAsFactors=F))
  } else{
    idx <- which(intervals.df$start <= x & intervals.df$end >= x)
    return(data.frame(col=intervals.df$col[idx],interval=intervals.df$interval[idx],stringsAsFactors=F))
  }
})))

Preparing the colors for the leged which will show each interval:

df$interval <- factor(df$interval,levels=intervals.df$interval)
colors <- intervals.df$col
names(colors) <- intervals.df$interval

Here's where I constructed the smoothed polygons (using a function courtesy of this link):

clusters <- sort(unique(df$cluster))
cluster.cols <- c("#ff00ff","#088163","#ccbfa5")


splinePolygon <- function(xy,vertices,k=3, ...)
{
  # Assert: xy is an n by 2 matrix with n >= k.
  # Wrap k vertices around each end.
  n <- dim(xy)[1]
  if (k >= 1) {
    data <- rbind(xy[(n-k+1):n,], xy, xy[1:k, ])
  } else {
    data <- xy
  }
  # Spline the x and y coordinates.
  data.spline <- spline(1:(n+2*k), data[,1], n=vertices, ...)
  x <- data.spline$x
  x1 <- data.spline$y
  x2 <- spline(1:(n+2*k), data[,2], n=vertices, ...)$y
  # Retain only the middle part.
  cbind(x1, x2)[k < x & x <= n+k, ]
}

library(data.table)
hulls.df <- do.call(rbind,lapply(1:length(clusters),function(l){
  dt <- data.table(df[which(df$cluster==clusters[l]),])
  hull <- dt[, .SD[chull(x,y)]]
  spline.hull <- splinePolygon(cbind(hull$x,hull$y),100)
  return(data.frame(x=spline.hull[,1],y=spline.hull[,2],val=NA,cluster=clusters[l],col=cluster.cols[l],interval=NA,stringsAsFactors=F))
}))
hulls.df$cluster <- factor(hulls.df$cluster,levels=clusters)

And here's my ggplot command:

library(ggplot2)

p <- ggplot(df,aes(x=x,y=y,colour=interval))+geom_point(cex=2,shape=1,stroke=1)+labs(x="X", y="Y")+theme_bw()+theme(legend.key=element_blank(),panel.border=element_blank(),strip.background=element_blank())+scale_color_manual(drop=FALSE,values=colors,name="DE")
p <- p+geom_polygon(data=hulls.df,aes(x=x,y=y,group=cluster),color=hulls.df$col,fill=NA)

which produces:

enter image description here

My question is how do I add a legend for the polygon under the legend for the points? I want it to a legend with 3 lines colored according to the cluster colors and the corresponding cluster number beside each line?

Upvotes: 4

Views: 4669

Answers (3)

cuttlefish44
cuttlefish44

Reputation: 6796

Say, you want to add a legend of the_factor. My basic idea is,

(1) put the_factor into mapping by using unused aes arguments; aes(xx = the_factor)
(2) if (1) affects something, delete the effect by using scale_xx_manual()
(3) modify the legend by using guides(xx = guide_legend(override.aes = list()))

In your case, aes(fill) and aes(alpha) are unused. The former is better to do it because of no effect. So I used aes(fill=as.factor(cluster)).

p <- ggplot(df,aes(x=x,y=y,colour=interval, fill=as.factor(cluster))) +   # add aes(fill=...)
  geom_point(cex=2, shape=1, stroke=1) + 
  labs(x="X", y="Y",fill="cluster") +          # add fill="cluster"
  theme_bw() + theme(legend.key=element_blank(),panel.border=element_blank(),strip.background=element_blank()) + scale_color_manual(drop=FALSE,values=colors,name="DE") +
  guides(fill = guide_legend(override.aes = list(colour = cluster.cols, pch=0))) # add

p <- p+geom_polygon(data=hulls.df,aes(x=x,y=y,group=cluster), color=hulls.df$col,fill=NA)


Of course, you can make the same graph by using aes(alpha = the_factor)). Because it has influence, you need to control it by using scale_alpha_manual().

g <- ggplot(df, aes(x=x,y=y,colour=interval)) +
  geom_point(cex=2, shape=1, stroke=1, aes(alpha=as.factor(cluster))) +  # add aes(alpha)
  labs(x="X", y="Y",alpha="cluster") +          # add alpha="cluster"
  theme_bw() + theme(legend.key=element_blank(),panel.border=element_blank(),strip.background=element_blank()) + scale_color_manual(drop=FALSE,values=colors,name="DE") +
  scale_alpha_manual(values=c(1,1,1)) +         # add
  guides(alpha = guide_legend(override.aes = list(colour = cluster.cols, pch=0))) # add

g <- p+geom_polygon(data=hulls.df,aes(x=x,y=y,group=cluster), color=hulls.df$col,fill=NA)

enter image description here

Upvotes: 3

Richard Telford
Richard Telford

Reputation: 9923

What you are asking for is two colour scales. My understanding is that this is not possible. But you can give the impression of having two colour scales with a bit of a cheat and using the filled symbols (shapes 21 to 25).

p <- ggplot(df, aes(x = x, y = y, fill = interval)) +
  geom_point(cex = 2, shape = 21, stroke = 1, colour = NA)+
  labs(x = "X", y = "Y") +
  theme_bw() +
  theme(legend.key = element_blank(), panel.border = element_blank(), strip.background = element_blank()) +
  scale_fill_manual(drop=FALSE, values=colors, name="DE") + 
  geom_polygon(data = hulls.df, aes(x = x, y = y, colour = cluster), fill = NA) + 
  scale_colour_manual(values = cluster.cols)
p

Alternatively, use a filled polygon with a low alpha

p <- ggplot(df,aes(x=x,y=y,colour=interval))+
  geom_point(cex=2,shape=1,stroke=1)+
  labs(x="X", y="Y")+
  theme_bw() +
 theme(legend.key = element_blank(),panel.border=element_blank(), strip.background=element_blank()) +
  scale_color_manual(drop=FALSE,values=colors,name="DE", guide = guide_legend(override.aes = list(fill = NA))) +
  geom_polygon(data=hulls.df,aes(x=x,y=y,group=cluster, fill = cluster),    alpha = 0.2, show.legend = TRUE) + 
      scale_fill_manual(values = cluster.cols) 
    p

But this might make the point colours difficult to see.

Upvotes: 2

Sandipan Dey
Sandipan Dey

Reputation: 23129

Slightly different output, only changing the last line of your code, it may solve your purpose:

p+geom_polygon(data=hulls.df,aes(x=x,y=y,group=cluster, fill=cluster),alpha=0.1)

enter image description here

Upvotes: 3

Related Questions