Neel
Neel

Reputation: 1

Volcano plot - colors

I am trying to plot a volcano plot with ggplot2. I would like to have three different colors based on the following criteria:

  1. qvalue <0.05 and meth.diff > 25% = Red
  2. qvalue <0.05 and meth.diff < -25% (minus 25%) = Green
  3. qvalue <0.05 and meth.diff between +25 and -25 = Gray

Similar questions have been asked here before and I tried following them but keep getting error messages. Any suggestions would be highly appreciated.

Here is the raw data file:

    chr  start    end strand       pvalue       qvalue  meth.diff
16 chr1  37801  38100      * 2.246550e-05 4.487042e-04  -36.485769
17 chr1  38101  38400      * 5.699781e-06 1.376471e-04  55.755181
29 chr1  49501  49800      * 1.453030e-18 2.442391e-16 -18.381131
35 chr1  62701  63000      * 5.547627e-03 3.686303e-02  -31.871711
54 chr1 122401 122700      * 3.917230e-03 2.845933e-02   63.443366
57 chr1 130201 130500      * 8.941091e-04 9.253737e-03  -8.347167

myDiff1p$threshold = factor(ifelse(myDiff1p$meth.diff>25 & myDiff1p$qvalue< 0.05, 1, 
  ifelse(myDiff1p$meth.diff<-25 & myDiff1p$qvalue< 0.05,-1,0)))

ggplot(data=myDiff1p, aes(x=meth.diff, y=-log10(qvalue))) + 
  geom_point(aes(color=myDiff1p$threshold), alpha=0.4, size=1.75)+ 
  geom_vline(xintercept=c(-25,25), color="red", alpha=1.0)+ 
  geom_hline(yintercept=2, color="blue", alpha=1.0)+ 
  xlab("Differential Methylation")+ 
  ylab("-log10 (qvalue)")+ 
  theme_bw()+
  xlim(c(-75, 75)) + 
  ylim(c(0, 300))

Error: Discrete value supplied to continuous scale

Upvotes: 0

Views: 2927

Answers (1)

Z.Lin
Z.Lin

Reputation: 29085

You have an almost unnoticeable mistake in this line:

myDiff1p$threshold = factor(ifelse(myDiff1p$meth.diff>25 & myDiff1p$qvalue< 0.05, 1, 
   ifelse(myDiff1p$meth.diff<-25 & myDiff1p$qvalue< 0.05,-1,0)))

As there's no space in myDiff1p$meth.diff<-25, it's interpreted as myDiff1p$meth.diff <- 25 rather than myDiff1p$meth.diff < -25. As a result, meth.diff got messed up.

Here's what I recommend:

library(dplyr)

myDiff1p <- myDiff1p %>%
  mutate(threshold = factor(case_when(meth.diff > 25 & qvalue < 0.05 ~ "cond1",
                                      meth.diff < -25 & qvalue < 0.05 ~ "cond2",
                                      TRUE ~ "cond3")))

ggplot(data=myDiff1p, aes(x=meth.diff, y=-log10(qvalue))) + 
  geom_point(aes(color=myDiff1p$threshold), alpha=0.4, size=1.75)+ 
  geom_vline(xintercept=c(-25,25), color="red", alpha=1.0)+ 
  geom_hline(yintercept=2, color="blue", alpha=1.0)+ 
  xlab("Differential Methylation")+ 
  ylab("-log10 (qvalue)")+ 
  theme_bw()+
  xlim(c(-75, 75)) +
  ylim(c(0, 300)) +
  scale_color_manual(name = "Threshold",
                     values = c("cond1" = "red", "cond2" = "green", "cond3" = "grey"))

plot

I labelled the threshold factor by condition, & defined the mapping between condition & colour in a named vector in scale_color_manual(). Also, a matter of personal preference, but I think dplyr::case_when() looks neater than nested ifelse() statements.

Upvotes: 2

Related Questions