Reputation: 11
For context: I'm looking at multiple different correlation coefficients.For each correlation I created a bootstrapped distribution, and I'm using the bootstrap percentile method to create to create confidence intervals for each coefficient. As I'm looking at multiple correlations, I'm actually using a more stringent alpha level, and I'll need to repeat this analysis for different data sets with different alpha corrections in the future. All of that has gone well, but I'm struggling to create a graph to represent the custom intervals as error bars.
Question: How do I create a graph in ggplot to represent the median values of my data along with custom percentiles for my error bars. My data is in a data.frame with one variable identifying the group (Analysis) and a second variable with all of the scores in the group. There are actually 10,000 cases for each level of the "Analysis" variable for a total of 40,000 rows. I've included an index printout for brevity immediately below.
>BootDistOverall[c(1:2,10000:10002,20000:20002,30000:30002),]
Analysis Dist
1 Alpha by Consequences (No Outlier) -0.4286326
2 Alpha by Consequences (No Outlier) -0.4191646
10000 Alpha by Consequences (No Outlier) -0.5248891
10001 Alpha by Past-30-Day Binge Drinking -0.2972018
10002 Alpha by Past-30-Day Binge Drinking -0.3011621
20000 Alpha by Past-30-Day Binge Drinking -0.4145920
20001 Q0 by Consequences 0.3689336
20002 Q0 by Consequences 0.4540535
30000 Q0 by Consequences 0.5772917
30001 Q0 by Past-30-Day Binge Drinking 0.6655952
30002 Q0 by Past-30-Day Binge Drinking 0.4412748
I've been able to create a violin plot of the data using ggplot (see link and code below), but I'd really like to have the median values of each distribution represented as well as the percentiles as error bars. I can get median values or a boxplot to represent this data, but I need the custom percentiles.
p0 <- <-ggplot(BootDistOverall, aes(Analysis,Dist))+
geom_violin(scale = "area",
color = "#002344",
size = 1,
fill = "#FECB00")+
ylim(-1,1)+
geom_hline(yintercept = 0,
linetype = "dashed",
color = "black")+
xlab("Analysis")+
ylab("Bootstrapped Pearson's r")+
coord_flip()+
theme_bw()
I need help creating a similar graph but with points for the median and errorbars corresponding to my custom percentiles. I've tried multiple different methods (geom_errorbar, geom_pointrange), and I can't seem to get any of them to work. The only way I've been able to make work is adding line segments to the graph individually like I would do in base R graphics with arrows()(see below for code and link), but there has to be a better way.I'm new to ggplot so there may be a simple fix, but I'm at my wits end.
#Create percentile points
Uppers = c(
quantile(BootDist2$Dist, .995,na.rm=T),
quantile(BootDist4$Dist, .995,na.rm=T),
quantile(BootDist1$Dist, .995,na.rm=T),
quantile(BootDist3$Dist, .995,na.rm=T))
Lowers = c(
quantile(BootDist2$Dist, .005,na.rm=T),
quantile(BootDist4$Dist, .005,na.rm=T),
quantile(BootDist1$Dist, .005,na.rm=T),
quantile(BootDist3$Dist, .005,na.rm=T))
#Create a point graph
ggplot(BootDistOverall, aes(x=Analysis,y=Dist))+
stat_summary(fun.y = mean,
geom = "point",
shape=22,
size=5,
color = "#002344",
fill = "#FECB00")+
theme_bw()+
coord_flip()+
ylim(-1,1)+
geom_hline(yintercept = 0,
linetype = "dashed",
color = "black")+
xlab("Analysis")+
ylab("Bootstrapped Pearson's r")+
#Add error bars with geomsemgents
geom_segment(x=1,xend=1,y=Lowers[1],yend=Uppers[1])+
geom_segment(x=2,xend=2,y=Lowers[2],yend=Uppers[2])+
geom_segment(x=3,xend=3,y=Lowers[3],yend=Uppers[3])+
geom_segment(x=4,xend=4,y=Lowers[4],yend=Uppers[4])+
geom_segment(x=.9,xend=1.1,y=Lowers[1],yend=Lowers[1])+
geom_segment(x=.9,xend=1.1,y=Uppers[1],yend=Uppers[1])+
geom_segment(x=1.9,xend=2.1,y=Lowers[2],yend=Lowers[2])+
geom_segment(x=1.9,xend=2.1,y=Uppers[2],yend=Uppers[2])+
geom_segment(x=2.9,xend=3.1,y=Lowers[3],yend=Lowers[3])+
geom_segment(x=2.9,xend=3.1,y=Uppers[3],yend=Uppers[3])+
geom_segment(x=3.9,xend=4.1,y=Lowers[4],yend=Lowers[4])+
geom_segment(x=3.9,xend=4.1,y=Uppers[4],yend=Uppers[4])
Upvotes: 1
Views: 1431
Reputation: 69221
Taking a bit of a leap of faith here in assuming that the MAX value for each analysis group is what you want to plot as the upper end of the error bar, the MIN value is the lower end of the error bar, and what is left over should be the median. Note - you only provided two rows for Q0 by Past-30-Day Binge Drinking
so this is likely a bad assumption...you'll need to modify according to match whatever your data actually represent...
...on to how to set up your data to plot in ggplot()
- the working paradigm is that you have one variable per aesthetic. In order to plot an error bar, you need x
, y
, ymin
, and ymax
. Once you reformat your data to match this, the plotting is straight forward. Here's a working example:
library(data.table)
library(ggplot2)
d <- structure(list(Analysis = c("Alpha by Consequences (No Outlier)",
"Alpha by Consequences (No Outlier)", "Alpha by Consequences (No Outlier)",
"Alpha by Past-30-Day Binge Drinking", "Alpha by Past-30-Day Binge Drinking",
"Alpha by Past-30-Day Binge Drinking", "Q0 by Consequences",
"Q0 by Consequences", "Q0 by Consequences", "Q0 by Past-30-Day Binge Drinking",
"Q0 by Past-30-Day Binge Drinking"), Dist = c(-0.4286326, -0.4191646,
-0.5248891, -0.2972018, -0.3011621, -0.414592, 0.3689336, 0.4540535,
0.5772917, 0.6655952, 0.4412748), var = c("median", "upper",
"lower", "upper", "median", "lower", "lower", "median", "upper",
"upper", "lower")), row.names = c(NA, -11L), class = c("data.table",
"data.frame"))
#impute which row is the min, max, and median - NOTE you only gave two rows for the last Analysis group
d[, var := ifelse(Dist == min(Dist), "lower", ifelse(Dist == max(Dist), "upper", "median")), by = Analysis]
#cast into one row per Analysis
d_wide <- dcast(Analysis ~ var, data = d, value.var = "Dist")
#plot
ggplot(d_wide, aes(Analysis, median, ymin = lower, ymax = upper)) +
geom_errorbar(width = .4) +
geom_point(colour = "orange", size = 4) +
coord_flip() +
theme_bw()
#> Warning: Removed 1 rows containing missing values (geom_point).
Created on 2019-03-09 by the reprex package (v0.2.1)
Upvotes: 1