Flobagob
Flobagob

Reputation: 76

Boxplots based on TRUE/FALSE variable columns

I have data that can be roughly replicated using

n = 10
df = data.frame(
  val= rnorm(n), 
  var1 = rbinom(n, 1, 0.5),
  var2 = rbinom(n, 1, 0.5),
  var3 = rbinom(n, 1, 0.5))

How can I plot this on a boxplot? What I'm looking for will have var1, var2 and var3 on the x-axis and val on the y-axis i.e. the box-and-whisker for var1 will contain only observations that have a 1 in the var1 column.

I've tried

df2 = melt(df, id.vars = c('val'), 
           variable.name ='vars', value.name = "include")

ggplot(df2, aes(x = include, y = val)) + geom_boxplot(aes(fill = vars)

but when I plot this I get 3 boxplots that look exactly the same.

Where am I going wrong?

Upvotes: 2

Views: 1289

Answers (2)

jay.sf
jay.sf

Reputation: 73272

In base R we can do

boxplot(sapply(df[-1], function(x) df$val[as.logical(x)]))

enter image description here

Or using ggplot2

df.r <- do.call(rbind, lapply(names(df)[-1], function(x) 
  data.frame(x, y=df$val[as.logical(df[,x])])))

library(ggplot2)
ggplot(df.r, aes(x=x, y=y)) + geom_boxplot(aes(fill=y))

enter image description here


Data

df <- structure(list(val = c(1.37095844714667, -0.564698171396089, 
0.363128411337339, 0.63286260496104, 0.404268323140999, -0.106124516091484, 
1.51152199743894, -0.0946590384130976, 2.01842371387704, -0.062714099052421
), var1 = c(1L, 0L, 1L, 1L, 0L, 1L, 0L, 1L, 0L, 1L), var2 = c(1L, 
1L, 0L, 1L, 0L, 1L, 0L, 0L, 1L, 1L), var3 = c(0L, 0L, 0L, 1L, 
0L, 1L, 1L, 1L, 1L, 1L)), class = "data.frame", row.names = c(NA, 
-10L))

Upvotes: 2

Rui Barradas
Rui Barradas

Reputation: 76565

The boxes are all the same because the graph is not taking into consideration the include vector. This is handled by the subset instruction in ggplot's data argument.

library(ggplot2)

ggplot(subset(df2, include != 0), aes(vars, val)) +
  geom_boxplot()

enter image description here

Data creation code.

I am reposting the data creation code setting the RNG seed.

set.seed(1234)
n = 10
df = data.frame(
  val= rnorm(n), 
  var1 = rbinom(n, 1, 0.5),
  var2 = rbinom(n, 1, 0.5),
  var3 = rbinom(n, 1, 0.5))
df

df2 <- reshape2::melt(df, id.vars = c('val'), 
           variable.name ='vars', value.name = "include")

Upvotes: 0

Related Questions