katkat
katkat

Reputation: 11

How to Add Significance bars a Bar Plot that have percentage/logical data

I have a dataframe with a categorical variable and a few logical variables. I’ve created an example using the iris dataset:

# Create a new dataframe showing the percentage each variable that is considered long 
# Numerical variables are first converted to logical based on if they are above or below a certain threshold, then the percentage of each variable that is true is calculated 

long_data <- iris %>%
  mutate(
    Long.Sepal.Length = Sepal.Length > 6,
    Long.Sepal.Width = Sepal.Width > 3,
    Long.Petal.Length = Petal.Length > 2.5,
    Long.Petal.Width = Petal.Width > 0.3
  ) %>%
  group_by(Species) %>%
  summarise(
    Long.Sepal.Length = mean(Long.Sepal.Length) * 100,
    Long.Sepal.Width = mean(Long.Sepal.Width) * 100,
    Long.Petal.Length = mean(Long.Petal.Length) * 100,
    Long.Petal.Width = mean(Long.Petal.Width) * 100
  ) %>%
  pivot_longer(
    cols = starts_with("Long"),
    names_to = "Measurement",
    values_to = "Percentage_True"
  )

> long_data
# A tibble: 12 × 3
   Species    Measurement       Percentage_True
   <fct>      <chr>                       <dbl>
 1 setosa     Long.Sepal.Length               0
 2 setosa     Long.Sepal.Width               84
 3 setosa     Long.Petal.Length               0
 4 setosa     Long.Petal.Width               18
 5 versicolor Long.Sepal.Length              40
 6 versicolor Long.Sepal.Width               16
 7 versicolor Long.Petal.Length             100
 8 versicolor Long.Petal.Width              100
 9 virginica  Long.Sepal.Length              82
10 virginica  Long.Sepal.Width               34
11 virginica  Long.Petal.Length             100
12 virginica  Long.Petal.Width              100

I want to create a bar plot to compare the percentage of TRUE values for each logical variable by species. Here’s how I plan to create the plot:

# Create the bar plot
ggplot(long_data, aes(x = Measurement, y = Percentage_True, fill = Species)) +
  geom_bar(stat = "identity", position = "dodge") +
  labs(
    x = "Measurement",
    y = "% True",
    fill = "Species"
  )

Plot I want to add significance bars to

How can I add significance bars above each group of bars?

I’ve tried using ggpubr functions, but I’m not sure how to make it work. I envison the plot to look something like the final graph on this page:https://rpkgs.datanovia.com/ggpubr/reference/geom_pwc.html But I cannot get it to work with my logical dataset, as the plot is not using the original data but the long format.

example of what I want my significance bars to look like

Upvotes: 1

Views: 41

Answers (1)

Edward
Edward

Reputation: 19459

One (hacky) way is to use the ggsignif package, and I'm sure there is better way than what I've done.

I've done this in two stages. The first stage is to save the mutated columns before pivoting because you need this data to perform the statistical tests.

iris2 <- iris %>%
  mutate(
    Species=as.character(Species),
    Long.Sepal.Length = Sepal.Length > 6,
    Long.Sepal.Width = Sepal.Width > 3,
    Long.Petal.Length = Petal.Length > 2.5,
    Long.Petal.Width = Petal.Width > 0.3
  ) 

Then the summarise and pivot (stage 2).

long_data <- iris2 %>%
  group_by(Species) %>%
  summarise(
    Long.Sepal.Length = mean(Long.Sepal.Length) * 100,
    Long.Sepal.Width = mean(Long.Sepal.Width) * 100,
    Long.Petal.Length = mean(Long.Petal.Length) * 100,
    Long.Petal.Width = mean(Long.Petal.Width) * 100
  ) %>%
  pivot_longer(
    cols = starts_with("Long"),
    names_to = "Measurement",
    values_to = "Percentage_True"
  )

Now the tests. You are comparing proportions here. For example, the proportion of long sepals among the 3 species.

xtabs(~Species + Long.Sepal.Width, data=iris2)
            Long.Sepal.Width
Species      FALSE TRUE
  setosa         8   42
  versicolor    42    8
  virginica     33   17

I believe the correct pairwise test to use here is a proportion test. As far as I'm aware, this test (prop.test) is not available in any of the ggplot-type packages, including ggsignif - I'd be happy if someone could prove me wrong on that. Instead of prop.test (or chisq.test) from the stats package, I've used the pairwise_prop_test function from the rstatix package for convenience. I've only tested the sepal lengths here for simplicity.

p.sw <- pairwise_prop_test(xtabs(~Species + Long.Sepal.Width,
        data=iris2))
# A tibble: 3 × 5
  group1     group2            p    p.adj p.adj.signif
* <chr>      <chr>         <dbl>    <dbl> <chr>       
1 setosa     versicolor 4.11e-11 1.23e-10 ****        
2 setosa     virginica  1.06e- 6 2.12e- 6 ****        
3 versicolor virginica  6.47e- 2 6.47e- 2 ns

Now manually add the adjusted p-values to the plot. The hacky part is getting the x-positions (xmin, xmax) in the right places.

p <- ggplot(long_data, aes(x = Measurement, y = Percentage_True, fill = Species)) +
  geom_bar(stat = "identity", position = "dodge") +
  labs(
    x = "Measurement",
    y = "% True",
    fill = "Species"
  ) +
  scale_y_continuous(limits=c(0,120)) +
  theme_minimal()

p + geom_signif(
  textsize = 3,
  tip_length=0.02,
  annotations = p.sw$p.adj,
  y_position = 105 + c(0, 7, 14), 
  xmin = 3.7 + c(0, 0, 0.3), 
  xmax = 4.0 + c(0, 0.3, 0.3)
)

enter image description here

Upvotes: 0

Related Questions