ChrisD
ChrisD

Reputation: 60

GGPLOT box plot subdivided by color with means in middle of boxplot

I've got data with two categorical variables. I can boxplot these but I can't get the means to display in the correct position. I've created the effect in the iris dataset (the red rectangles are added by hand, not in ggplot).

means are not plotted with relevent boxplot

Iris <- iris %>%
        mutate(SepalLengthType = ifelse(Sepal.Length > 5.8, "high", "low"))

means <- Iris %>% 
        group_by(Species, SepalLengthType) %>% 
        summarise(Sepal.Width = mean(Sepal.Width), .groups = "keep")
plot <- ggplot(data = Iris, aes(y=Sepal.Width, x = SepalLengthType, colour = Species))+
        geom_boxplot()

Now I want to add the means to each box plot These lines below all work, but the mean is not centred on the box plot but on the SepelLengthType category.

plot + stat_summary(fun = "mean" , aes(color = Species), shape = 15)
plot + stat_summary(fun = "mean" , aes(group = Species), shape = 15)
plot + stat_summary(fun.y = "mean", shape = 15) # this works, but is deprecated
plot + geom_point(data = means, aes(color = Species), shape = 15)

How can the means be displayed in the middle of each box plot? I appreciate I could re-arrange the data so each set of data points is in it's own column, but as they are not all the same length, this needs it's own work-arounds.

When I use fun = "mean" I get a warning message "Removed 5 rows containing missing values (geom_segment)." Why is that? The 'means' line does not have this problem but I'd rather not have to calculate the means myself.

Upvotes: 1

Views: 414

Answers (1)

UseR10085
UseR10085

Reputation: 8176

You can use position=position_dodge(0.9) like the following code

library(tidyverse)

Iris <- iris %>%
  mutate(SepalLengthType = ifelse(Sepal.Length > 5.8, "high", "low"))

means <- Iris %>% 
  group_by(Species, SepalLengthType) %>% 
  summarise(Sepal.Width = mean(Sepal.Width), .groups = "keep")

plot <- ggplot(data = Iris, aes(y=Sepal.Width, x = SepalLengthType, colour = Species))+
  geom_boxplot(position=position_dodge(0.9))

plot + geom_point(data = means, aes(color = Species), shape = 15, 
                  position = position_dodge2(width = 0.9))

enter image description here

or using stat_summary as

plot + stat_summary(fun = "mean", aes(group = Species), shape = 15, 
                  position = position_dodge2(width = 0.9))

enter image description here

Upvotes: 2

Related Questions