Reputation: 76

Boxplots based on TRUE/FALSE variable columns

I have data that can be roughly replicated using

n = 10
df = data.frame(
  val= rnorm(n), 
  var1 = rbinom(n, 1, 0.5),
  var2 = rbinom(n, 1, 0.5),
  var3 = rbinom(n, 1, 0.5))

How can I plot this on a boxplot? What I'm looking for will have var1, var2 and var3 on the x-axis and val on the y-axis i.e. the box-and-whisker for var1 will contain only observations that have a 1 in the var1 column.

I've tried

df2 = melt(df, id.vars = c('val'), 
           variable.name ='vars', value.name = "include")

ggplot(df2, aes(x = include, y = val)) + geom_boxplot(aes(fill = vars)

but when I plot this I get 3 boxplots that look exactly the same.

Where am I going wrong?

Upvotes: 2

Answers (2)

jay.sf

Reputation: 73272

In base R we can do

boxplot(sapply(df[-1], function(x) df$val[as.logical(x)]))

Or using ggplot2

df.r <- do.call(rbind, lapply(names(df)[-1], function(x) 
  data.frame(x, y=df$val[as.logical(df[,x])])))

library(ggplot2)
ggplot(df.r, aes(x=x, y=y)) + geom_boxplot(aes(fill=y))

Data

df <- structure(list(val = c(1.37095844714667, -0.564698171396089, 
0.363128411337339, 0.63286260496104, 0.404268323140999, -0.106124516091484, 
1.51152199743894, -0.0946590384130976, 2.01842371387704, -0.062714099052421
), var1 = c(1L, 0L, 1L, 1L, 0L, 1L, 0L, 1L, 0L, 1L), var2 = c(1L, 
1L, 0L, 1L, 0L, 1L, 0L, 0L, 1L, 1L), var3 = c(0L, 0L, 0L, 1L, 
0L, 1L, 1L, 1L, 1L, 1L)), class = "data.frame", row.names = c(NA, 
-10L))

Upvotes: 2

Rui Barradas

Reputation: 76565

The boxes are all the same because the graph is not taking into consideration the include vector. This is handled by the subset instruction in ggplot's data argument.

library(ggplot2)

ggplot(subset(df2, include != 0), aes(vars, val)) +
  geom_boxplot()

Data creation code.

I am reposting the data creation code setting the RNG seed.

set.seed(1234)
n = 10
df = data.frame(
  val= rnorm(n), 
  var1 = rbinom(n, 1, 0.5),
  var2 = rbinom(n, 1, 0.5),
  var3 = rbinom(n, 1, 0.5))
df

df2 <- reshape2::melt(df, id.vars = c('val'), 
           variable.name ='vars', value.name = "include")

Upvotes: 0

Boxplots based on TRUE/FALSE variable columns

Answers (2)

Related Questions