Reputation: 601
I would like to be able to draw a boxplot for a given theoretical distribution. For example, let's say I want to draw a boxplot for the normal distribution. R has the method qnorm. So I can get the first, second, and third quartile like this:
quartiles <- qnorm((1:3) / 4)
the interquartile range can be obtained:
irq <- quartiles[3] - quartiles[1]
the whiskers can be obtained:
left.whisker <- quartiles[1] - irq * 1.5
right.whisker <- quartiles[3] + irq * 1.5
Now, how can I create a boxplot?
I know that I can use rnorm and then use the method bixplot but I would like to be able to draw the boxplot based on the theoretical distribution. It is useful for teaching so students don't have to worry about the number of values they have to simulate and also the output will not depend on the number of values simulated.
Thanks, Nikola
Upvotes: 1
Views: 756
Reputation: 49640
The bxp
function can be used to create a boxplot based on supplied summary statistics (the boxplot
function calls bxp
to do its plotting). So you just need to create the correct type of object and pass it to bxp
:
q1 <- qnorm(0.25)
q2 <- qnorm(0.5)
q3 <- qnorm(0.75)
lower <- q1 - 1.5*(q3-q1)
upper <- q3 + 1.5*(q3-q1)
tmp.list <- list( stats=rbind(lower, q1, q2, q3, upper),
out=numeric(0), group=numeric(0), names='')
bxp( tmp.list )
Upvotes: -1
Reputation: 5274
Here's a fairly random distribution:
set.seed(1)
d1 <- c(rbeta(5,1,1), runif(5))
boxplot(d1)
If you look at the code, with graphics::boxplot.default
you'll see it calls a function boxplot.stats
(in package grDevices
), which you can call to give you the values required for a boxplot. This in turn calls
stats::fivenum
and the method, applied to vector x
is:
x <- sort(x)
n4 <- floor((length(x) + 3) / 2) / 2
d <- c(1, n4, (length(x) + 1) / 2, length(x) + 1 - n4, length(x))
0.5 * (x[floor(d)] + x[ceiling(d)])
Upvotes: 1
Reputation: 724
If this is used for a class, I think plotting the theoretical quantiles fails to convey the connection of the distribution to the real world, where variation is everywhere and standard normals are only observed asymptotically.
This is my attempt to show the relationship of a random variable to the theoretical quantiles.
Notice that I am sampling using rnorm
, I also plot the data behind the geom_boxplot
using the geom_jitter
geom. Changing the alpha
settings will affect transparency.
install.packages("gridExtra"); install.packages("ggplot2")
library(gridExtra); library(ggplot2)
df <- data.frame(list(our_rand_var = rnorm(10000, mean = 0, sd = 1)))
p1 <- ggplot(df, aes(x = our_rand_var)) +
geom_density(fill = "white") +
ylab("") +
xlab("") +
theme(axis.text = element_text(size = 20),
axis.title.y = element_blank(),
axis.text.y = element_blank())
p2 <- ggplot(df, aes(x = "Our Variable", y = our_rand_var)) +
geom_jitter(alpha = 0.2) +
geom_boxplot(alpha = 0.9, colour = "red", size = 2) +
ylab("Standard Deviations") +
coord_flip() +
theme(axis.text = element_text(size = 20),
axis.title.y = element_blank(),
axis.text.y = element_blank())
grid.arrange(p1, p2, ncol = 1,
main = "Standard Normal Distribution (~Z)")
Upvotes: 2