Achal Neupane
Achal Neupane

Reputation: 5719

How to make box plots for two types of samples using percentiles in R

I have a data that looks like this:

df <-data.frame(
  Group = c("1", "2", "3", "4"), 
  GOOD_0 = c(1L, 1L, 1L, 1L), 
  GOOD_25 = c(61.25, 1, 1, 1), 
  GOOD_50 = c(119, 1, 1, 1), 
  GOOD_75 = c(153, 1, 1, 1), 
  GOOD_100 = c(237L, 1L, 1L, 1L), 
  SALINE_0 = c(1L, 1L, 1L, 1L), 
  SALINE_25 = c(1, 40.25, 1, 22.5), 
  SALINE_50 = c(1, 86, 52.5, 122.5), 
  SALINE_75 = c(1, 136, 101.5, 269.25), 
  SALINE_100 = c(60L, 360L, 222L, 508L)
)

I want to plot box plot for both GOOD and SALINE types one after another (perhaps in two different colors). Numbers after GOOD_ and SALINE_ indicate their percentiles. How can I make box plot for Groups using these percentiles in R?

I can do for GOOD types like this, but couldn't include SALINE boxes in the same plot

ggplot(df, aes(x=Group, ymin = GOOD_0, lower = GOOD_25, middle = GOOD_50, upper = GOOD_75, ymax = GOOD_100)) +
      geom_boxplot(stat = "identity")

Upvotes: 0

Views: 72

Answers (1)

Croote
Croote

Reputation: 1424

If you transform your data a little you can do this easily. The best way to deal with ggplot is to have your data in long format. So rejigging your dataframe to look this way, and adding a column which identifies which group SALINE or GOOD it belongs to.

I am assuming your x variable is Group since x doesnt exst in the data as your have done with aes(x=x ...)

GOOD <- df %>% select(Group, starts_with("GOOD")) %>% rename(Percentile_0 = GOOD_0, 
                                                     Percentile_25 = GOOD_25, 
                                                     Percentile_50 = GOOD_50, 
                                                     Percentile_75 = GOOD_75, 
                                                     Percentile_100 = GOOD_100) 
SALINE <- df %>% select(Group, starts_with("SALINE")) %>% rename(Percentile_0 = SALINE_0, 
                                                       Percentile_25 = SALINE_25, 
                                                       Percentile_50 = SALINE_50, 
                                                       Percentile_75 = SALINE_75, 
                                                       Percentile_100 = SALINE_100) 


new_df <- bind_rows(GOOD %>% mutate(grp = "GOOD"), SALINE %>% mutate(grp = "SALINE"))

new_df
# A tibble: 8 x 7
  Group Percentile_0 Percentile_25 Percentile_50 Percentile_75 Percentile_100 grp   
  <fct>        <int>         <dbl>         <dbl>         <dbl>          <int> <chr> 
1 1                1          61.2         119            153             237 GOOD  
2 2                1           1             1              1               1 GOOD  
3 3                1           1             1              1               1 GOOD  
4 4                1           1             1              1               1 GOOD  
5 1                1           1             1              1              60 SALINE
6 2                1          40.2          86            136             360 SALINE
7 3                1           1            52.5          102.            222 SALINE
8 4                1          22.5         122.           269.            508 SALINE

Now there are a few ways to have done what I did above. but once that is done, plotting both is very straightforward and ggplot will create a legend for you if you specify a colour aesthetic. Hence,

new_df %>% ggplot(aes(x = Group, group = grp, colour = grp)) +
           geom_boxplot(stat = "identity", 
                        aes(ymin = Percentile_0, lower = Percentile_25, middle = Percentile_50, upper = Percentile_75, ymax = Percentile_100))

Final Data frame

structure(list(Group = structure(c(1L, 2L, 3L, 4L, 1L, 2L, 3L, 
4L), .Label = c("1", "2", "3", "4"), class = "factor"), Percentile_0 = c(1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L), Percentile_25 = c(61.25, 1, 1, 1, 
1, 40.25, 1, 22.5), Percentile_50 = c(119, 1, 1, 1, 1, 86, 52.5, 
122.5), Percentile_75 = c(153, 1, 1, 1, 1, 136, 101.5, 269.25
), Percentile_100 = c(237L, 1L, 1L, 1L, 60L, 360L, 222L, 508L
), grp = c("GOOD", "GOOD", "GOOD", "GOOD", "SALINE", "SALINE", 
"SALINE", "SALINE")), row.names = c(NA, -8L), class = "data.frame")

Upvotes: 1

Related Questions