Reputation: 15
I'm an R novice and working on project with script provided by my professor and I'm having trouble getting an accurate mean for my data that matches the box plot that I created. The mean in this plot is below 300kg per stem and the mean I am getting when I use
ggsummarystats( DBHdata, x = "location", y = "biomassKeith_and_Camphor", ggfunc = ggboxplot, add = "jitter" )
or
tapply(DBHdata$biomassBrown_and_Camphor, DBHdata$location, mean)
I end up with means over 600 kg/stem. Is there way to produce summary statistics in the code for my box plot.
Box and Whisker plot of kg per stem
Upvotes: 1
Views: 1816
Reputation: 2280
I'm not sure if I understand your question correctly, but first try calculating the group means with aggregate and then adding a text with means.
Sample code:
means <- aggregate(weight ~ group, PlantGrowth, mean)
library(ggplot2)
ggplot(PlantGrowth, aes(x=group, y=weight, fill=group)) +
geom_boxplot() +
stat_summary(fun=mean, colour="darkred", geom="point",
shape=18, size=3, show.legend=FALSE) +
geom_text(data = means, aes(label = weight, y = weight + 0.08))
Plot:
Sample data:
data(PlantGrowth)
Upvotes: 0
Reputation: 11
As others have pointed out, a boxplot shows the median per default. If you want to get the mean with ggstatsplot, you can change the functions that you call with the summaries argument, as such:
ggsummarystats(DBHdata, x = "location", y = "biomassKeith_and_Camphor",
ggfunc = ggboxplot, add = "jitter", summaries = c("n", "median", "iqr", "mean"))
This would add the mean besides the standard output of n, median, and interquartile range (iqr).
Upvotes: 1
Reputation: 183
Additionally, the data appears to be very skewed towards large numbers, so a mean of over 600 despite medians of ca 200 is not surpringing
Upvotes: 1
Reputation: 177
The boxplots do not contain mean values, but median instead. So this could explain the variation you are observing in your calculations.
Upvotes: 1