KKolo
KKolo

Reputation: 35

Boxplot with Summarized and Grouped Data in R

I have the following pre-summarized cost data:

MeanCost Std MedianCost LowerIQR UpperIQR StatusGroup AgeGroup
700 500 650 510 780 Dead Young
800 600 810 666 1000 Alive Young
500 200 657 450 890 Comatose Young
300 400 560 467 670 Dead Old
570 600 500 450 600 Alive Old
555 500 677 475 780 Comatose Old
333 455 300 200 400 Dead Middle
678 256 600 445 787 Alive Middle
1500 877 980 870 1200 Comatose Middle

I wish to create a boxplot with this information - similar to the one below. Where each Color represents Status Group (blue=dead, read=alive, green=comatose). And each grouped cluster represents an age group (left cluster=young, middle cluster=middle, right cluster=old).

enter image description here

I know that I don't have min and max, so whiskers are not necessary.
I want to code this in R, and any help would be appreciated! Thank you.

Here is the code I have tried:

 dattest<- data.frame(
  Mean_Cost = c(700,800,500,300,570,555,333,678,1500), 
  Std = c(500,600,200,400,600,500,455,256,877), 
  Median_Cost = c(650,810,657,560,500,677,300,600,980), 
  LowerIQR = c(510,666,450,467,450,475,200,445,870), 
  UpperIQR = c(780,1000,890,670,600,780,400,787,1200), 
  StatusGroup = c(1,2,3,1,2,3,1,2,3),
  AgeGroup = c(1,1,1,2,2,2,3,3,3))

where for StatusGroup 1=dead, 2=alive, 3-comatose
and for AgeGroup 1=young, 2=old, 3=middle

 ggplot(dattest, aes(xmin = AgeGroup-.25, xmax=AgeGroup+.25, ymin=LowerIQR, ymax=UpperIQR)) + 
    geom_rect(fill="transparent", col = "blue") + 
    geom_segment(aes(y=Median_Cost, yend=Median_Cost, x=AgeGroup-.25, xend=AgeGroup+.25), col="blue") + 
    geom_point(mapping=aes(x = StatusGroup, y = Mean_Cost), col="red") +
    scale_x_continuous(breaks=1:3, labels=c("Young","Old","Middle")) + 
    theme_classic()

And this code is definitely not giving me what I want

Upvotes: 1

Views: 1102

Answers (1)

pete_999999999
pete_999999999

Reputation: 81

Is this what you are trying to do?

library(tidyverse)
df <- tibble::tribble(
  ~MeanCost, ~Std, ~MedianCost, ~LowerIQR, ~UpperIQR, ~StatusGroup, ~AgeGroup,
       700L, 500L,        650L,      510L,      780L,       "Dead",   "Young",
       800L, 600L,        810L,      666L,     1000L,      "Alive",   "Young",
       500L, 200L,        657L,      450L,      890L,   "Comatose",   "Young",
       300L, 400L,        560L,      467L,      670L,       "Dead",     "Old",
       570L, 600L,        500L,      450L,      600L,      "Alive",     "Old",
       555L, 500L,        677L,      475L,      780L,   "Comatose",     "Old",
       333L, 455L,        300L,      200L,      400L,       "Dead",  "Middle",
       678L, 256L,        600L,      445L,      787L,      "Alive",  "Middle",
      1500L, 877L,        980L,      870L,     1200L,   "Comatose",  "Middle"
  )

df %>% 
  mutate(AgeGroup = factor(AgeGroup, levels = c("Young", "Middle", "Old"))) %>% 
  ggplot(aes(x = AgeGroup, fill = StatusGroup)) +
  geom_boxplot(aes(
    lower = LowerIQR, 
    upper = UpperIQR, 
    middle = MedianCost, 
    ymin = MedianCost - Std, 
    ymax = MedianCost + Std),
    stat = "identity", width = 0.5)

test.png

Edit

To add an "x" at the mean you can adjust the position:

df %>% 
  mutate(AgeGroup = factor(AgeGroup, levels = c("Young", "Middle", "Old"))) %>% 
  ggplot(aes(x = AgeGroup, fill = StatusGroup)) +
  geom_boxplot(aes(
    lower = LowerIQR, 
    upper = UpperIQR, 
    middle = MedianCost, 
    ymin = MedianCost - Std, 
    ymax = MedianCost + Std),
    stat = "identity", width = 0.5) +
  geom_point(aes(y = MeanCost),
             position = position_dodge(width = 0.5),
             shape = 4)

test2.png

Upvotes: 3

Related Questions