tskpowell
tskpowell

Reputation: 15

Generating Statistics Summary from a ggplot in R

I'm an R novice and working on project with script provided by my professor and I'm having trouble getting an accurate mean for my data that matches the box plot that I created. The mean in this plot is below 300kg per stem and the mean I am getting when I use

ggsummarystats( DBHdata, x = "location", y = "biomassKeith_and_Camphor", ggfunc = ggboxplot, add = "jitter" )

or

tapply(DBHdata$biomassBrown_and_Camphor, DBHdata$location, mean)

I end up with means over 600 kg/stem. Is there way to produce summary statistics in the code for my box plot.

Box and Whisker plot of kg per stem

Upvotes: 1

Views: 1816

Answers (4)

Rfanatic
Rfanatic

Reputation: 2280

I'm not sure if I understand your question correctly, but first try calculating the group means with aggregate and then adding a text with means.

Sample code:

means <- aggregate(weight ~  group, PlantGrowth, mean)

library(ggplot2)
    ggplot(PlantGrowth, aes(x=group, y=weight, fill=group)) + 
    geom_boxplot() +
      stat_summary(fun=mean, colour="darkred", geom="point", 
                   shape=18, size=3, show.legend=FALSE) + 
      geom_text(data = means, aes(label = weight, y = weight + 0.08))

Plot:

enter image description here

Sample data:

data(PlantGrowth)

Upvotes: 0

b_siepe
b_siepe

Reputation: 11

As others have pointed out, a boxplot shows the median per default. If you want to get the mean with ggstatsplot, you can change the functions that you call with the summaries argument, as such:

ggsummarystats(DBHdata, x = "location", y = "biomassKeith_and_Camphor",
ggfunc = ggboxplot, add = "jitter", summaries = c("n", "median", "iqr", "mean"))

This would add the mean besides the standard output of n, median, and interquartile range (iqr).

Upvotes: 1

user1697590
user1697590

Reputation: 183

Additionally, the data appears to be very skewed towards large numbers, so a mean of over 600 despite medians of ca 200 is not surpringing

Upvotes: 1

edv
edv

Reputation: 177

The boxplots do not contain mean values, but median instead. So this could explain the variation you are observing in your calculations.

Upvotes: 1

Related Questions