user2840286
user2840286

Reputation: 601

Boxplot for a theoretical distribution in T

I would like to be able to draw a boxplot for a given theoretical distribution. For example, let's say I want to draw a boxplot for the normal distribution. R has the method qnorm. So I can get the first, second, and third quartile like this:

quartiles <- qnorm((1:3) / 4)

the interquartile range can be obtained:

irq <- quartiles[3] - quartiles[1]

the whiskers can be obtained:

left.whisker <- quartiles[1] - irq * 1.5
right.whisker <- quartiles[3] + irq * 1.5

Now, how can I create a boxplot?

I know that I can use rnorm and then use the method bixplot but I would like to be able to draw the boxplot based on the theoretical distribution. It is useful for teaching so students don't have to worry about the number of values they have to simulate and also the output will not depend on the number of values simulated.

Thanks, Nikola

Upvotes: 1

Views: 756

Answers (3)

Greg Snow
Greg Snow

Reputation: 49640

The bxp function can be used to create a boxplot based on supplied summary statistics (the boxplot function calls bxp to do its plotting). So you just need to create the correct type of object and pass it to bxp:

q1 <- qnorm(0.25)
q2 <- qnorm(0.5)
q3 <- qnorm(0.75)

lower <- q1 - 1.5*(q3-q1)
upper <- q3 + 1.5*(q3-q1)

tmp.list <- list( stats=rbind(lower, q1, q2, q3, upper),
    out=numeric(0), group=numeric(0), names='')

bxp( tmp.list )

Upvotes: -1

dardisco
dardisco

Reputation: 5274

Here's a fairly random distribution:

set.seed(1)
d1 <- c(rbeta(5,1,1), runif(5))
boxplot(d1)

If you look at the code, with graphics::boxplot.default you'll see it calls a function boxplot.stats (in package grDevices), which you can call to give you the values required for a boxplot. This in turn calls stats::fivenum and the method, applied to vector x is:

x <- sort(x)
n4 <- floor((length(x) + 3) / 2) / 2
d <- c(1, n4, (length(x) + 1) / 2, length(x) + 1 - n4, length(x))
0.5 * (x[floor(d)] + x[ceiling(d)])

Upvotes: 1

Statwonk
Statwonk

Reputation: 724

If this is used for a class, I think plotting the theoretical quantiles fails to convey the connection of the distribution to the real world, where variation is everywhere and standard normals are only observed asymptotically.

This is my attempt to show the relationship of a random variable to the theoretical quantiles.

Notice that I am sampling using rnorm, I also plot the data behind the geom_boxplot using the geom_jitter geom. Changing the alpha settings will affect transparency.

install.packages("gridExtra"); install.packages("ggplot2")
library(gridExtra); library(ggplot2)

df <- data.frame(list(our_rand_var = rnorm(10000, mean = 0, sd = 1)))

p1 <- ggplot(df, aes(x = our_rand_var)) +
  geom_density(fill = "white") +
  ylab("") +
  xlab("") +
  theme(axis.text = element_text(size = 20),
        axis.title.y = element_blank(),
        axis.text.y = element_blank())

p2 <- ggplot(df, aes(x = "Our Variable", y = our_rand_var)) +
  geom_jitter(alpha = 0.2) +
  geom_boxplot(alpha = 0.9, colour = "red", size = 2) +
  ylab("Standard Deviations") +
  coord_flip() +
  theme(axis.text = element_text(size = 20),
        axis.title.y = element_blank(),
        axis.text.y = element_blank())

grid.arrange(p1, p2, ncol = 1, 
             main = "Standard Normal Distribution (~Z)")

Upvotes: 2

Related Questions