user3206440
user3206440

Reputation: 5059

box plot from a dataframe with quantiles for multiple groups

I have a dataframe df as below.

df <- data.frame(test = c("Test1", "Test2", "Test1", "Test2"),
                 group = c("A", "A", "B", "B"),
                 varC5th = c(2, 3, 1, 5),
                 varC25th = c(20, 30, 10, 50),
                 varC50th = c(25, 35, 15, 55),
                 varC75th = c(35, 45, 25, 75),
                 varC95th = c(65, 75, 55, 105),
                 varD5th = c(0.2, 0.8, 0.6, 0.4),
                 varD25th = c(2, 8, 6, 4),
                 varD50th = c(3, 9, 7, 5),
                 varD75th = c(5, 11, 9, 7),
                 varD95th = c(9, 15, 13, 11)
                 )

Using df with ggplot I need to plot the boxplot for varC and varD with the given five quantile values ( varX5th ... varX95th) faceted by test for the two groups A & B.

What have I tried

I got to some point where I could get the boxplot for one group with values passed in aes like below.

ggplot(df[1,], 
       aes(x=group, ymin = varC5th, lower = varC25th,
           middle = varC50th, upper = varC75th, ymax = varC95th)) + geom_boxplot(stat = "identity")

This gives a plot like below - I need help with including all groups and faceting by test so that the plots for the two tests can be seen in one chart. Also I prefer to have the horizontal lines at both the end points of the boxplot.

enter image description here

Upvotes: 2

Views: 782

Answers (1)

zx8754
zx8754

Reputation: 56189

Try below:

ggplot(df, 
       aes(x = group, ymin = varC5th, lower = varC25th,
           middle = varC50th, upper = varC75th, ymax = varC95th)) +
  geom_boxplot(stat = "identity") +
  geom_errorbar() + 
  facet_grid(.~test)

enter image description here


Edit: After re-reading your post I realised you want 4 plots, vartypeCD vs test12. For this we need to transform the data then plot:

library(ggplot2)
library(tidyr)
library(dplyr)

plotDat <-
  gather(df, key = "Key", value = "Value", -c(test, group)) %>% 
  mutate(varType = substr(Key, 1, 4),
         q = make.names(substr(Key, 5, 8))) %>% 
  select(-Key) %>% 
  spread(key = q, value = Value)

plotDat
#    test group varType X25th X50th X5th X75th X95th
# 1 Test1     A    varC    20    25  2.0    35    65
# 2 Test1     A    varD     2     3  0.2     5     9
# 3 Test1     B    varC    10    15  1.0    25    55
# 4 Test1     B    varD     6     7  0.6     9    13
# 5 Test2     A    varC    30    35  3.0    45    75
# 6 Test2     A    varD     8     9  0.8    11    15
# 7 Test2     B    varC    50    55  5.0    75   105
# 8 Test2     B    varD     4     5  0.4     7    11

# now let's plot
ggplot(plotDat, 
       aes(x = group, ymin = X5th, lower = X25th,
           middle = X50th, upper = X75th, ymax = X95th)) +
  geom_boxplot(stat = "identity") +
  geom_errorbar() + 
  facet_grid(varType ~ test, scale = "free_y")

enter image description here

Upvotes: 4

Related Questions