Reputation: 2519
Making boxplots for the flights data contained in the nycflights13 package I get the following boxplot. Code:
library(ggplot2)
library(nycflights13)
attach(flights)
ggplot(flights, aes(x = "", y = dep_delay)) +
geom_boxplot(color="darkblue")
As shown in the image there are a lot of outliers which makes it hard to read anything else but these outliers.
Are there any methods/techniques to improve the readability of this plot?
Upvotes: 1
Views: 435
Reputation: 226182
As suggested by @dario, you could transform the y-axis scale. The interquartile ranges (box range) will stay the same, but because the whiskers are defined as 1.5* the IQR, the definition of 'outlier' will differ according to the scale ... I found the asinh()
transformation a little too extreme, so tried the signed-square-root transformation as well ...
library(ggplot2)
library(nycflights13)
library(cowplot)
tt <- scales::trans_new("asinh", transform=asinh, inverse=sinh)
ss <- scales::trans_new("ssqrt", transform=function(x) sign(x)*sqrt(abs(x)), inverse=function(x) sign(x)*x^2)
gg0 <- ggplot(flights, aes(y = dep_delay)) +
geom_boxplot(color="darkblue")+
scale_x_continuous(breaks=NULL)
plot_grid(nrow=1,
gg0 + ylab("flight delay\n(original scale)"),
gg0 + scale_y_continuous(trans=tt) + ylab("flight delay\n(asinh transform)"),
gg0 + scale_y_continuous(trans=ss) + ylab("flight delay\n(signed sqrt transform)"))
ggsave("scale_boxplot.png")
Upvotes: 3
Reputation: 6483
We have many options. But basically we can
Transform:
library(ggplot2)
library(nycflights13)
ggplot(flights, aes(x = "", y = asinh(dep_delay))) +
geom_boxplot(color="darkblue")
or use a different type of plot:
ggplot(flights, aes(x = "", y = dep_delay)) +
geom_point(alpha=0.7) +
geom_jitter()
or transform and use a different type plot:
ggplot(flights, aes(x = "", y = asinh(dep_delay))) +
geom_violin()
What the correct solution is depends heavily on the thing you want to illustrate with the plot (and the available data of course)
Upvotes: 3