Brennan
Brennan

Reputation: 429

How to manually create box plots in R with two categories on x-axis

I have seen very similar questions but none that quite fit exactly what I am trying to do. I have the following RWE:

n1 = 177
avg1 = 7.508192
sd1 = 5.439677
error1 <- qnorm(.975) * sd1/sqrt(n1)
avg1 - error1
avg1 + error1

n2 = 93
avg2 = 6.713011
sd2 = 3.22479
error2 <- qnorm(.975) * sd2/sqrt(n2)
avg2 - error2
avg2 + error2

I can see from computing avg +/- error the extent to which the confidence intervals overlap or not; however, I wish to plot these two sets of data side by side with their means & confidence intervals to show in a nice graphic. I want to be able to label the x-axis as "data1" & "data2". I have looked through the boxplot functionality and can't seem to figure out how to do this when I am not using data per se but rather manually generating confidence intervals. I am not sure if boxplot is the proper function to use, but its in the ballpark of what I am looking for. Any advice/places to look/simple oversights on my part here?

Upvotes: 1

Views: 1008

Answers (2)

StupidWolf
StupidWolf

Reputation: 46908

Hey if you want to plot the 95% CI, boxplot in base R might not be the best.. Because you would have to use the whisker as the confidence interval? You can use geom_point() in combination with geom_errorbar(), see an example dataset I created with your values:

x = data.frame(
x=rep(c("a","b"),each=2),
data=rep(c("A","B"),2),
avg=rep(c(avg1,avg2),2),
lower=rep(c(avg1 - error1,avg2 - error2),2),
upper=rep(c(avg1 + error1,avg2 + error2),2)
)
ggplot(x,aes(x=x,y=avg,col=data,ymin=lower,ymax=upper)) +
geom_point(position=position_dodge(width=0.1)) +
geom_errorbar(width=0.1,position=position_dodge(width=0.1))

enter image description here

Upvotes: 1

Phil
Phil

Reputation: 185

boxplot and bwplot require the actual data for plotting. But there is a function called bxp (in graphics) that takes summary statistics data as input and produces boxplots from it. It expects a list of paramters as would be the output of calling boxplot. So you want to read both ?bxp and the value section of ?boxplot. At a minimum, your list needs to contain the stats and names elements. However, visualizing the summary data you show above, in a boxplot will most likely be confusing to others, because you are using numbers that differ from what a boxplot usually shows – see the Details section of boxplot.stats for the definitions commonly used.

Your data suggests, that you want to plot the mean and indicate the 95% confidence interval. That could be accomplished with a barplot plus error bars. There are many ways of doing that in R – see this post: Add error bars to a barplot

Of course, a proper boxplot gives a lot more information, than a "dynamite plot" (aka barplot with error bars). So if you do have the raw data or can get the summary statistics necessary to construct one that would be preferable.

Upvotes: 1

Related Questions