Mike Onder
Mike Onder

Reputation: 57

Adding lines to grouped boxplots

I have a dataset with 3 factors (Parent.organization, Hierarchy, variable) as well as a metric variable (value) and could use some help. Here is some sample data of the same style:

sampleData <- data.frame(id = 1:100, 
Hierarchy = sample(c("Consultant", "Registrar", "Intern", "Resident"), 100, replace = TRUE),
                     Parent.organization = sample(c("Metropolitan", "Regional"), 100, replace = TRUE),
                     variable = sample(c("CXR", "AXR", "CTPA", "CTB"), 100, replace = TRUE),
                     value = rlnorm(20, log(10), log(2.5)))
summary(sampleData)

Using the following code I get the graph below

library(ggplot2)
library(scales)

p0 = ggplot(sampleData, aes(x = Hierarchy, y = value, fill = variable)) +
  geom_boxplot() 
plog = p0 + scale_y_log10(breaks = trans_breaks("log10", function(x) 10^x),
                      labels = trans_format("log10", math_format(10^.x))) +
  theme_bw() +
 facet_grid(.~Parent.organization, scales = "free", space = "free")

enter image description here

I have a set of values I want to mark for each scan variable (these are the same across all elements of the hierarchy and represent true values). Lets say they are 3, 5, 7, 5 for AXR, CTB, CTPA, CXR respectively. I want these overlayed on top but I am unsure how to proceed.

I'm after something akin to (I've just filled the first two but the same pattern would apply across the board):

enter image description here

My knowledge of R is improving but I'd say I'm still fairly inept. Also any suggestions on how to improve my question are also very welcome.

Upvotes: 2

Views: 554

Answers (1)

Didzis Elferts
Didzis Elferts

Reputation: 98419

First, you have to make new data frame for the lines, where you have the same grouping and facetting variables as in original data frame. All the data should be repeated for the all combinations.

true.df<-data.frame(Hierarchy =rep(rep(c("Consultant", "Registrar", "Intern", "Resident"),each=4),times=2),
                    Parent.organization = rep(c("Metropolitan", "Regional"),each=16),
                    variable = rep(c("AXR", "CTB", "CTPA", "CXR"),times=8),
                    true.val=rep(c(3,5,7,5),times=8))

Then you can use geom_crossbar() to add the lines. Use true.val for the y, ymin and ymax to get lines. position=position_dodge() will ensure that lines are dodged and show_guide=FALSE will ensure that legend isn't affected.

plog+geom_crossbar(data=true.df,aes(x = Hierarchy,y=true.val,ymin=true.val,
                                    ymax=true.val,fill=variable),
                   show_guide=FALSE,position=position_dodge(),color="red")

enter image description here

Upvotes: 2

Related Questions