user08041991
user08041991

Reputation: 637

boxplot using ggplot with n > 5

I am sure this question has been asked before. but I was unable to find anything similiar. So consider a simple worked example

We create random data and then create boxplots:

set.seed(123456)
Ax <- sample(1:3, size = 75, replace = T)
Fac <- sample(LETTERS[1:4], 75, replace = T)
yvalue <- runif(75)

df1 <- data.frame(Ax, Fac, yvalue)

library(ggplot2)
ggplot(df1, aes(factor(Ax), yvalue, colour = Fac)) + 
  geom_boxplot()

But we review our data closer:

table(df1$Ax, df1$Fac)

I want to create a boxplot plot like the one above, but when the group sizes (n=) is less than 6, then either:

That is for the following data shaded in the red circles enter image description here

Upvotes: 1

Views: 180

Answers (2)

Roman
Roman

Reputation: 17648

You can try:

include column of occurence using ave()

df1$length <- ave(df1$yvalue, interaction(df1$Ax, df1$Fac), FUN=length)

Now for instance adjust the alpha to plot uncoloured/shaded boxes:

ggplot(df1, aes(factor(Ax), yvalue, fill = Fac, alpha=factor(ifelse(df1$length < 6 ,0.5, 1)))) + 
geom_boxplot()

enter image description here

Upvotes: 2

gowerc
gowerc

Reputation: 1099

If you don't care about have placeholder spaces for where the boxplots used to be you can simply just remove the observations that don't meet your criteria. The example below makes use of dplyr for the data manipulation

library(dplyr)
library(ggplot2)

### Identify all groups that have > 5 observations per group
df2 <- df1 %>%  group_by(Fac , Ax) %>%  summarise( n = n()) %>%  filter ( n > 5)

### Only keep groups that meet our criteria 
df3 <- df1 %>%  semi_join(df2 , by = c("Fac" , "Ax") )

ggplot(df3, aes(factor(Ax), yvalue, colour = Fac)) + 
  geom_boxplot()

Upvotes: 1

Related Questions