SecretIndividual
SecretIndividual

Reputation: 2519

Improving boxplot readability

Making boxplots for the flights data contained in the nycflights13 package I get the following boxplot. enter image description here Code:

library(ggplot2)
library(nycflights13)

attach(flights)

ggplot(flights, aes(x = "", y = dep_delay)) + 
  geom_boxplot(color="darkblue")

As shown in the image there are a lot of outliers which makes it hard to read anything else but these outliers.

Are there any methods/techniques to improve the readability of this plot?

Upvotes: 1

Views: 435

Answers (2)

Ben Bolker
Ben Bolker

Reputation: 226182

As suggested by @dario, you could transform the y-axis scale. The interquartile ranges (box range) will stay the same, but because the whiskers are defined as 1.5* the IQR, the definition of 'outlier' will differ according to the scale ... I found the asinh() transformation a little too extreme, so tried the signed-square-root transformation as well ...

library(ggplot2)
library(nycflights13)
library(cowplot)

tt <- scales::trans_new("asinh", transform=asinh, inverse=sinh)
ss <- scales::trans_new("ssqrt", transform=function(x) sign(x)*sqrt(abs(x)), inverse=function(x) sign(x)*x^2)

gg0 <- ggplot(flights, aes(y = dep_delay)) + 
    geom_boxplot(color="darkblue")+
    scale_x_continuous(breaks=NULL)

plot_grid(nrow=1,
    gg0 + ylab("flight delay\n(original scale)"),
    gg0 + scale_y_continuous(trans=tt) + ylab("flight delay\n(asinh transform)"),
    gg0 + scale_y_continuous(trans=ss) + ylab("flight delay\n(signed sqrt transform)"))

ggsave("scale_boxplot.png")

enter image description here

Upvotes: 3

dario
dario

Reputation: 6483

We have many options. But basically we can

Transform:

library(ggplot2)
library(nycflights13)

ggplot(flights, aes(x = "", y = asinh(dep_delay))) + 
  geom_boxplot(color="darkblue")

enter image description here

or use a different type of plot:

ggplot(flights, aes(x = "", y = dep_delay)) + 
  geom_point(alpha=0.7) +
  geom_jitter()

enter image description here

or transform and use a different type plot:

ggplot(flights, aes(x = "", y = asinh(dep_delay))) + 
geom_violin()

enter image description here

What the correct solution is depends heavily on the thing you want to illustrate with the plot (and the available data of course)

Upvotes: 3

Related Questions