Luigi Degni
Luigi Degni

Reputation: 21

add density distribution to boxplot in r

I am trying plot a boxplot in R with ggplot but, on the right, I want to add a density distribution of the unpaired mean difference between the two conditions.

I am able to draw the boxplot. Also, I can plot the density distribution using the dabest package for R. The problem is that I can't manage to add the distribution to the right of the boxplot (in the same figure).

This figure shows what I have so far: boxplotDens

On the left I have the boxplot. The vertical line, highlighted within the purple box, shows my attempt to insert a density distribution and where I would like it to be drawn.

The plot on the right shows the unpaired mean difference distribution (highlighted within the red box).

IMPORTANT NOTE: the density distribution is not just a distribution of my 2 variables but the distribution of the mean difference as generated by Dabset.

This is the code that I wrote so far, with the dataset reproduced using the dput() function.

dati = structure(list(Gen_index = c(-0.00343550869493355, -0.000512183151748252, 
0.00139426380539972, 0.0275725594834101, 0.0126799465057088, 
0.00320829189195402, 0.01812225100518, 0.00529620409323125, 0.0209152331716265, 
-0.00145167919921569, 0.0320459849991131, 0.0149825463814834, 
0.0502934366927441, 0.013005055314573, -0.00477085474565764, 
0.0138333676098974, -0.00413766184101653, 0.0108210245511905, 
-0.0130742164666196, 0.038012078129249, -0.0199214805648889, 
0.0278553527554661, 0.0222158839803678, 0.0456732475430057, 0.0127870386211424, 
0.0215400931156267, -0.015830282102788, 0.0294794928746215, 0.0121618200064815, 
0.0142731704932927, -0.0029645627865988, 0.00696345357967932, 
0.00972165212030399, -4.67543539177211e-05, 0.0168234233489839, 
0.00403121883677643, 0.0242027036110072, -0.00932965953571428
), Spec_index = c(1.025, 0.175, 0.03, -0.6, 0.505, 0.895, -0.29, 
0.49, 0.19, -0.4, 0.215, 0.07, -0.05, 0.15, 0.52, -0.87, 0.22, 
-0.345, -0.78, 0.62, 0.055, 1.015, -0.505, 0.96, 0.475, -0.02, 
0.105, 0.945, -0.705, -0.565, -0.025, 0.08, 0, 0.305, -0.255, 
0.005, 0.795, 0.435), Gender = c("F", "F", "M", "M", "M", "M", 
"F", "F", "M", "M", "F", "F", "M", "M", "F", "M", "F", "M", "F", 
"F", "F", "M", "M", "F", "M", "M", "F", "M", "M", "M", "F", "M", 
"F", "F", "M", "F", "M", "F")), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -38L))

This is the code i wrote for boxplots:

library(readxl)
library(ggplot2)
library(dplyr)
library(psych)
library(tidyr)
library(cowplot)
library(reshape)
library(Rmisc)
library(dabestr)
Males = dati %>% 
  filter(Gender == "M")
Males = Males$Gen_index
Females = dati %>% 
  filter(Gender == "F")
Females = Females$Gen_index
mean_diff <- mean(dati$Gen_index[dati$Gender == "M"]) - mean(na.omit(dati$Gen_index[dati$Gender == "F"]))
sd_diff <- sqrt(var(dati$Gen_index[dati$Gender == "M"])/length(dati$Gen_index[dati$Gender == "M"]) + var(dati$Gen_index[dati$Gender == "F"])/length(dati$Gen_index[dati$Gender == "F"]))
conf_interval <- qt(0.975, df = length(dati$Gen_index[dati$Gender == "M"]) + length(dati$Gen_index[dati$Gender == "F"]) - 2) * sd_diff
ggplot(dati, aes(x = Gender, y = Gen_index)) +
  geom_jitter(aes(colour = Gender), width = 0.05) +
  geom_boxplot(aes(colour = Gender),outlier.shape= NA) +
  geom_point(aes(x = 1, y = mean(na.omit(Females)))) +
  geom_point(aes(x = 2, y = mean(na.omit(Males)))) +
  geom_segment(aes(x = 1, xend = 2, y = mean(na.omit(Females)), yend = mean(Males), linetype = "dashed")) +
  geom_errorbar(aes(x = 3, ymin = mean_diff - conf_interval, ymax = mean_diff + conf_interval), width = 0.2) +
  geom_density(alpha = .3, colour = NA, trim=FALSE) +
  labs(title = "Difference in mean between two groups", x = "", y = "General PIT") +
  theme_classic() +
  theme(legend.position = "bottom")    

           

This is the code for unpaired mean difference on dabest:

two.group.unpaired <- 
  dati %>%
  dabest(Gender, Gen_index, 
         # The idx below passes "Control" as the control group, 
         # and "Group1" as the test group. The mean difference
         # will be computed as mean(Group1) - mean(Control1).
         idx = c("F", "M"), 
         paired = FALSE)
two.group.unpaired 

plot(mean_diff(two.group.unpaired), color.column = Gender)

Thank you

This is pretty much what I am trying to achieve: Desired result

Upvotes: 2

Views: 472

Answers (1)

Glu
Glu

Reputation: 327

This should be close to what you are looking for.

First, you generate the Boxplot. Notice: I manually set the y axis limits and I save the result in myBoxplot:

Males = dati %>% 
  filter(Gender == "M")
Males = Males$Gen_index
Females = dati %>% 
  filter(Gender == "F")
Females = Females$Gen_index
mean_diff <- mean(dati$Gen_index[dati$Gender == "M"]) - mean(na.omit(dati$Gen_index[dati$Gender == "F"]))
sd_diff <- sqrt(var(dati$Gen_index[dati$Gender == "M"])/length(dati$Gen_index[dati$Gender == "M"]) + var(dati$Gen_index[dati$Gender == "F"])/length(dati$Gen_index[dati$Gender == "F"]))
conf_interval <- qt(0.975, df = length(dati$Gen_index[dati$Gender == "M"]) + length(dati$Gen_index[dati$Gender == "F"]) - 2) * sd_diff
myBoxplot = ggplot(dati, aes(x = Gender, y = Gen_index)) +
  geom_jitter(aes(colour = Gender), width = 0.05) +
  geom_boxplot(aes(colour = Gender),outlier.shape= NA) +
  geom_point(aes(x = 1, y = mean(na.omit(Females)))) +
  geom_point(aes(x = 2, y = mean(na.omit(Males)))) +
  geom_segment(aes(x = 1, xend = 2, y = mean(na.omit(Females)), yend = mean(Males), linetype = "dashed")) +
  geom_errorbar(aes(x = 3, ymin = mean_diff - conf_interval, ymax = mean_diff + conf_interval), width = 0.2) +
  scale_y_continuous(breaks = round(seq(-0.02, 0.05, by = 0.01),1), limits = c(-0.02, 0.05))+
  geom_density(alpha = .3, colour = NA, trim=FALSE) +
  labs(title = "Difference in mean between two groups", x = "", y = "General PIT") +
  
  theme_classic() +
  theme(legend.position = "bottom")    
myBoxplot

Then, you generate your Estimation statistics and Cummings plot

two.group.unpaired <- 
  dati %>%
  dabest(Gender, Gen_index, 
         # The idx below passes "Control" as the control group, 
         # and "Group1" as the test group. The mean difference
         # will be computed as mean(Group1) - mean(Control1).
         idx = c("F", "M"), 
         paired = FALSE)
two.group.unpaired 

plot(mean_diff(two.group.unpaired), color.column = Gender)

I am not 100% sure but I think that the data you are trying to plot are stored in: densDataExtracted$result$bootstraps[[1]]

If that's correct, you can extract them as follows:

densDataExtracted= mean_diff(two.group.unpaired)
densityData = as.data.frame(densDataExtracted$result$bootstraps[[1]])
colnames(densityData)[1]  = "densityDis" 

Finally, you can create a density plot of those data and add it to the boxplot:

densConnectdBox <- ggplot(densityData, aes(x = densityDis)) + 
  geom_density(alpha = .3, fill="lightblue", trim=FALSE) +
  theme_void() + 
  theme(legend.position = "none") +
  scale_x_continuous(breaks = round(seq(-0.02, 0.05, by = 0.01),1), limits = c(-0.02, 0.05))+
  coord_flip()

myBoxplot + densConnectdBox

Notice that for this to work you need to install and load the patchwork library:

library(patchwork)

Also notice that I manually set the limits of the distribution to be the same of the boxplot.

Let me know if this works!

Upvotes: 1

Related Questions