Reputation: 39
I need to create a qq plot of -log10 p-values in ggplot2
where a subset of 137 points ("targets") are highlighted in gold using a colorblind-friendly palette I'm using called cbbPalette
. I cannot do this in an alternate package because I eventually need to combine multiple qq plots into a grid using grid.arrange
from the gridExtra
package that works with ggplot2
.
Setup:
library(ggplot2)
library(reshape2)
cbbPalette <- c("#E69F00", "#000000") #part of my palette; gold & black
set.seed(100)
The data consists of 100,137 p-values, 137 of which are targets:
p_values = c(
runif(100000, min = 0, max = 1),
runif(132, min = 1e-7, max = 1),
c(6e-20, 6e-19, 7e-9, 7.5e-9, 4e-8)
)
#labels for the p-values
names_letters <-
do.call(paste0, replicate(2, sample(LETTERS, 100137, TRUE), FALSE))
names = paste0(names_letters, sprintf("%04d", sample(9999, 100137, TRUE)))
targets = names[100001:100137] #last 137 are targets
df = as.data.frame(p_values)
df$names = names
df <-
df[sample(nrow(df)), ] #shuffles the df to place targets randomly w/in
df$Category = ifelse(df$names %in% targets, "Target", "Non-Target")
Appearance of Data:
head(df, 4)
p_values names Category
89863 0.4821147 NZ3385 Non-Target
20209 0.3998835 SQ3793 Non-Target
29200 0.7893478 ZT5497 Non-Target
71623 0.3459360 QF5311 Non-Target
Melted df Using reshape2
with Observed (o) & Expected (e) -log10 p-values:
df.m = melt(df)
df.m$o = -log10(sort(df.m$value, decreasing = F))
df.m$e = -log10(1:nrow(df.m) / nrow(df.m))
Appearance of Melted df:
head(df.m,4)
names Category variable value o e
1 NZ3385 Non-Target p_values 0.4821147 19.221849 5.000595
2 SQ3793 Non-Target p_values 0.3998835 18.221849 4.699565
3 ZT5497 Non-Target p_values 0.7893478 8.154902 4.523473
4 QF5311 Non-Target p_values 0.3459360 8.124939 4.398535
QQ-plot
df_qq = ggplot(df.m, aes(e, o)) +
geom_point(aes(color = Category)) +
scale_colour_manual(values = cbbPalette) +
geom_abline(intercept = 0, slope = 1) +
ylab("Observed -log[10](p)") +
xlab("Theoretical -log[10](p)")
I then get a qq with no highlighting of my 137 targets.
Upvotes: 1
Views: 1463
Reputation: 16832
If you want to avoid having to split your dataframe into two calls to geom_point
, you can order the data by the Category column first, then pipe it into ggplot
. For just these two category values, you could arrange pretty simply:
df.m %>%
arrange(Category) %>%
ggplot(...)
which will put your data in alphabetical order with Non-Target observations, then Target ones. Points get drawn in order, so this will put points in the target category on top.
To have more control over the ordering, you can make Category a factor, and set the levels explicitly, then arrange by the factor order:
df.m %>%
mutate(Category = as.factor(Category) %>% fct_relevel("Target")) %>%
arrange(desc(Category)) %>%
ggplot(...)
I'm using fct_relevel
from the forcats
package, just because it's a really easy way to manipulate factor levels; you could order levels with base R as well. fct_relevel
puts the Target level first, so when I arrange by Category, I'm doing it in reverse, so that again Target gets drawn last.
Hope that makes sense!
Upvotes: 1
Reputation: 60060
You can draw the targets in a separate geom_point()
call after the non-targets, the geoms are plotted in order so the targets end up on top:
cbbPalette <- c(Target = "#E69F00", `Non-Target` = "#000000")
df_qq = ggplot(df.m, aes(e, o)) +
geom_abline(intercept = 0, slope = 1) +
geom_point(aes(color = Category), data = df.m[df.m$Category == "Non-Target", ]) +
geom_point(aes(color = Category), data = df.m[df.m$Category == "Target", ]) +
scale_colour_manual(values = cbbPalette) +
ylab("Observed -log[10](p)") +
xlab("Theoretical -log[10](p)")
I've also added names to your palette to make sure the right colours are attached to each category, when changing the order of the geom_point()
calls this can get mixed up otherwise.
Result:
Upvotes: 1