DigiPath
DigiPath

Reputation: 179

R - reformat P value in ggplot using 'stat_compare_means'

I want to plot the p values to each panel in a faceted ggplot. If the p value is larger than 0.05, I want to display the p value as it is. If the p value is smaller than 0.05, I want to display the value in scientific notation (i.e, 0.0032 -> 3.20e-3; 0.0000425 -> 4.25e-5).

The code I wrote to do this is:

   p1 <- ggplot(data = CD3, aes(location, value, color = factor(location),
                             fill = factor(location))) + 
  theme_bw(base_rect_size = 1) +
  geom_boxplot(alpha = 0.3, size = 1.5, show.legend = FALSE) +
  geom_jitter(width = 0.2, size = 2, show.legend = FALSE) +
  scale_color_manual(values=c("#4cdee6", "#e47267", "#13ec87")) +
  scale_fill_manual(values=c("#4cdee6", "#e47267", "#13ec87")) +
  ylab(expression(paste("Density of clusters, ", mm^{-2}))) +
  xlab(NULL) +
  stat_compare_means(comparisons = list(c("CT", 'N'), c("IF","N")), 
                     aes(label = ifelse(..p.format.. < 0.05, formatC(..p.format.., format = "e", digits = 2),
                                        ..p.format..)), 
                     method = 'wilcox.test', show.legend = FALSE, size = 10) +
  #ylab(expression(paste('Density, /', mm^2, )))+
  theme(axis.text = element_text(size = 10), 
        axis.title = element_text(size = 20), 
        legend.text = element_text(size = 38), 
        legend.title = element_text(size = 40), 
        strip.background = element_rect(colour="black", fill="white", size = 2),
        strip.text = element_text(margin = margin(10, 10, 10, 10), size = 40),
        panel.grid = element_line(size = 1.5))
plot(p1)

This code runs without error, however, the format of numbers isn't changed. What am I doing wrong? enter image description here I attached the data to reproduce the plot: donwload data here

EDIT

structure(list(value = c(0.931966449207829, 3.24210526315789, 
3.88811650210901, 0.626860993574675, 4.62085308056872, 0.477508650519031, 
0.111900110501359, 3.2495164410058, 4.06626506024096, 0.21684918139434, 
1.10365086026018, 4.66666666666667, 0.174109967855698, 0.597625869832174, 
2.3758865248227, 0.360751947840548, 1.00441501103753, 3.65168539325843
), Criteria = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("Density", "Density of cluster", 
"nodular count", "Elongated count"), class = "factor"), Case = structure(c(1L, 
1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L, 5L, 5L, 5L, 6L, 6L, 
6L), .Label = c("Case 1A", "Case 1B", "Case 2", "Case 3", "Case 4", 
"Case 5"), class = "factor"), Mark = structure(c(1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("CD3", 
"CD4", "CD8", "CD20", "FoxP3"), class = "factor"), location = structure(c(3L, 
1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 
2L), .Label = c("CT", "IF", "N"), class = "factor")), row.names = c(91L, 
92L, 93L, 106L, 107L, 108L, 121L, 122L, 123L, 136L, 137L, 138L, 
151L, 152L, 153L, 166L, 167L, 168L), class = "data.frame")

Upvotes: 2

Views: 8664

Answers (1)

dc37
dc37

Reputation: 16178

I think your issue came from the stat_compare_means and the use of comparisons. I'm not totally sure, but I will guess that the output of p value for stat_compare_means is different from compare_means and so, you can't use it for the aes of label.

Let me explain, with your example, you can modify the display of the p.value like this:

library(ggplot2)
library(ggpubr)
ggplot(df, aes(x = location, y = value, color = location))+
  geom_boxplot()+
  stat_compare_means(ref.group = "N", aes(label = ifelse(p < 0.05,sprintf("p = %2.1e", as.numeric(..p.format..)), ..p.format..)))

enter image description here

You get the correct display of p.value but you lost your bars. So, if you use comparisons argument, you get:

library(ggplot2)
library(ggpubr)
ggplot(df, aes(x = location, y = value, color = location))+
    geom_boxplot()+
    stat_compare_means(comparisons = list(c("CT","N"), c("IF","N")), aes(label = ifelse(p < 0.05,sprintf("p = %2.1e", as.numeric(..p.format..)), ..p.format..)))

enter image description here

So, now, you get bars but not the correct display.

To circumwent this issue, you can perform the statistics outside of ggplot2 using compare_means functions and use the package ggsignif to display the correct display.

Here, I'm using dplyr and the function mutate to create new columns, but you can do it easily in base R.

library(dplyr)
library(magrittr)
c <- compare_means(value~location, data = df, ref.group = "N")
c %<>% mutate(y_pos = c(5,5.5), labels = ifelse(p < 0.05, sprintf("%2.1e",p),p))

# A tibble: 2 x 10
  .y.   group1 group2       p p.adj p.format p.signif method   y_pos labels 
  <chr> <chr>  <chr>    <dbl> <dbl> <chr>    <chr>    <chr>    <dbl> <chr>  
1 value N      CT     0.00866 0.017 0.0087   **       Wilcoxon   5   8.7e-03
2 value N      IF     0.00866 0.017 0.0087   **       Wilcoxon   5.5 8.7e-03

Then, you can plot it:

library(ggplot2)
library(ggpubr)
library(ggsignif)
ggplot(df, aes(x = location, y = value))+
  geom_boxplot(aes(colour = location))+
  ylim(0,6)+
  geom_signif(data = as.data.frame(c), aes(xmin=group1, xmax=group2, annotations=labels, y_position=y_pos),
                manual = TRUE)

enter image description here

Does it look what you are trying to plot ?

Upvotes: 5

Related Questions