Reputation: 76
I have data that can be roughly replicated using
n = 10
df = data.frame(
val= rnorm(n),
var1 = rbinom(n, 1, 0.5),
var2 = rbinom(n, 1, 0.5),
var3 = rbinom(n, 1, 0.5))
How can I plot this on a boxplot? What I'm looking for will have var1
, var2
and var3
on the x-axis and val
on the y-axis i.e. the box-and-whisker for var1
will contain only observations that have a 1 in the var1
column.
I've tried
df2 = melt(df, id.vars = c('val'),
variable.name ='vars', value.name = "include")
ggplot(df2, aes(x = include, y = val)) + geom_boxplot(aes(fill = vars)
but when I plot this I get 3 boxplots that look exactly the same.
Where am I going wrong?
Upvotes: 2
Views: 1289
Reputation: 73272
In base R we can do
boxplot(sapply(df[-1], function(x) df$val[as.logical(x)]))
Or using ggplot2
df.r <- do.call(rbind, lapply(names(df)[-1], function(x)
data.frame(x, y=df$val[as.logical(df[,x])])))
library(ggplot2)
ggplot(df.r, aes(x=x, y=y)) + geom_boxplot(aes(fill=y))
Data
df <- structure(list(val = c(1.37095844714667, -0.564698171396089,
0.363128411337339, 0.63286260496104, 0.404268323140999, -0.106124516091484,
1.51152199743894, -0.0946590384130976, 2.01842371387704, -0.062714099052421
), var1 = c(1L, 0L, 1L, 1L, 0L, 1L, 0L, 1L, 0L, 1L), var2 = c(1L,
1L, 0L, 1L, 0L, 1L, 0L, 0L, 1L, 1L), var3 = c(0L, 0L, 0L, 1L,
0L, 1L, 1L, 1L, 1L, 1L)), class = "data.frame", row.names = c(NA,
-10L))
Upvotes: 2
Reputation: 76565
The boxes are all the same because the graph is not taking into consideration the include
vector. This is handled by the subset
instruction in ggplot
's data
argument.
library(ggplot2)
ggplot(subset(df2, include != 0), aes(vars, val)) +
geom_boxplot()
Data creation code.
I am reposting the data creation code setting the RNG seed.
set.seed(1234)
n = 10
df = data.frame(
val= rnorm(n),
var1 = rbinom(n, 1, 0.5),
var2 = rbinom(n, 1, 0.5),
var3 = rbinom(n, 1, 0.5))
df
df2 <- reshape2::melt(df, id.vars = c('val'),
variable.name ='vars', value.name = "include")
Upvotes: 0