Reputation: 1353
I am struggling with the following code using the iris dataset. I would like to draw a violin plot, only including setosa species, and do some complex overlapping combinations on subgroups of data.
Specifically, on the x-axis, I would like to first divide continuous Sepal.Length data into groups: group A=Sepal.Length < 4.7, group B=Sepal.Length 4.7 - 5, group C=Sepal.Length 5 - 5.2 and group D=Sepal.Length > 5.2.
Then, I would like to plot four violins/boxes on the x-axis, single/overlapping groups: "B", "A+C", "D", "A+C+D". The y-axis is simply "Petal.Length".
I am also including code to show the sample size (n) for each violin.
I would appreciate any suggestions. Thank you.
library(dplyr)
library(ggplot2)
library(ggpubr)
# Define order of violins on x-axis.
order <- c("B", "AC", "D", "ACD")
# Function to display sample size (n) for each violin.
give.n <- function(x){return(c(y = min(Petal.Length), label = length(x)))}
iris %>%
filter(Species == "setosa") %>%
mutate(sub_a = case_when( Sepal.Length < 4.7~"A",
Sepal.Length < 5~ "B",
Sepal.Length < 5.2~ "C",
TRUE~"D")) %>%
mutate(collapsed = c((ifelse(sub_a %in% c("A", "C"), "AC", sub_a)), (ifelse(sub_a %in% c("AC", "D"), "ACD", sub_a)))) %>%
ggviolin(iris[iris$Species == "setosa", ], x=collapsed, y=Petal.Length) + scale_x_discrete(limits=order) + stat_summary(fun.data = give.n, geom = "text")
Edit
See below for expected result. Note that the numbers below each violin are accurate. The rest of the image is only an example of expected outcome.
Upvotes: 1
Views: 665
Reputation: 9923
I cannot see how to do this as a single chain, but here is a brute-force solution that uses cut
and then bind_rows
setosa <- iris %>% filter(Species == "setosa") %>%
mutate(group = cut(Sepal.Length, breaks = c(0, 4.7, 5, 5.2, Inf), labels = c("A", "B", "C", "D"), right = FALSE))
bind_rows(B = setosa %>% filter(group == "B"),
AC = setosa %>% filter(group %in% c("A", "C")),
D = setosa %>% filter(group == "D"),
ACD = setosa %>% filter(group %in% c("A", "C", "D")),
.id = "group2"
) %>%
mutate(group2 = factor(group2, levels = c("B", "AC", "D", "ACD"))) %>%
ggplot(aes(x = group2, y = Petal.Length)) +
geom_violin()
Upvotes: 3