Rajan
Rajan

Reputation: 463

Changing whisker length of multiple boxplot in R

I have a dataframe of 10 variables and I plotted it in two columns. But ggplot defines whiskers as 5th and 95th perecentile. I want whisker lengths as Q1 - 1.5*IQR / Q3 + 1.5*IQR for each of these plots and outliers as usual. A similar question has been posted in this link, but I couldn't make use of it. Any help will be appreciated!!

library(ggplot2)
library(tidyr)

df <- data.frame(matrix(rnorm(2000), ncol = 10))
plot.data <- gather(df, variable, value)
# plot.data$out <- as.numeric(rep(input_data, each = nrow(x_train)))
p <- ggplot(plot.data, aes(x = 0,  y=value))
p <- p + geom_boxplot()
#p <- p + geom_point(aes(x = 0, y = test_data), color = "red") 
p <- p + facet_wrap(~variable, scales = "free_x", strip.position = 'top', ncol = 2)
p <- p + coord_flip()
p <- p + xlab("") + ylab("")
p <- p + theme(legend.position="none") + theme_bw() 
p <- p + theme(axis.text.y=element_blank(),
          axis.ticks.y=element_blank())
p

Upvotes: 1

Views: 1901

Answers (1)

MingH
MingH

Reputation: 171

By default (notched=FALSE), the geom_boxplot() should give you the whisker you want (Q1 - 1.5*IQR / Q3 + 1.5*IQR). See a more current question link. Although, this is subjected to the quantile, IQR definition.

If you insist on setting them manually with stat_summary

# geom_boxplot parameters with stat summary
f <- function(x) {
  r <- quantile(x, probs = c(0.25, 0.25, 0.5, 0.75, 0.75))
  r[[1]]<-r[[1]]-1.5*IQR(x) #ymin lower whisker, as per geom_boxplot
  r[[5]]<-r[[5]]+1.5*IQR(x) #ymax upper whisker 
  names(r) <- c("ymin", "lower", "middle", "upper", "ymax") 
  r 
}

# To subset the outlying points for plotting, 
o <- function(x) {
  r <- quantile(x, probs = c(0.25, 0.75))
  r[[1]]<-r[[1]]-1.5*IQR(x)
  r[[2]]<-r[[2]]+1.5*IQR(x)
  subset(x, x < r[[1]] | r[[2]] < x)
}

# added seed for consistency
set.seed(123)    

df <- data.frame(matrix(rnorm(2000), ncol = 10))
plot.data <- gather(df, variable, value)
# plot.data$out <- as.numeric(rep(input_data, each = nrow(x_train)))
p <- ggplot(plot.data, aes(x = 0,  y=value))
p <- p + stat_summary(fun.data = f, geom="boxplot")+ 
  stat_summary(fun.y = o, geom="point")
#p <- p + geom_point(aes(x = 0, y = test_data), color = "red") 
p <- p + facet_wrap(~variable, scales = "free_x", strip.position = 'top', ncol = 2)
p <- p + coord_flip()
p <- p + xlab("") + ylab("")
p <- p + theme(legend.position="none") + theme_bw() 
p <- p + theme(axis.text.y=element_blank(),
               axis.ticks.y=element_blank())

Upvotes: 4

Related Questions