Reputation: 1875
I'm looking to create a set of box plots where I create a bloxplot for each variable in sampledf1 against the single variable in sampledf2.
The actual use case is I've created a set of clusters with k-means and now want to see their distribution for each of the found clusters with each variable in the dataframe I'm using for clustering.
sampledf1 <- as.data.frame(replicate(6, sample(c(1:10,NA))))
sampledf2 <- as.data.frame(replicate(1, sample(c(21:30,NA))))
Then I want to see a box plot with each of the variables in sampledf1 paired with the only variable in sampledf2.
I would like to use something like:
sapply(boxplot(sampledf1~sampledf2$V1))
but this gives me this error:
Error in match.fun(FUN) : argument "FUN" is missing, with no default
Anyway I could do this would dplyr would be great but I didn't see any functions that I could chain together to do this.
Upvotes: 0
Views: 582
Reputation: 31454
You can use ggplot and facets, if you first reshape your data into long format
library(reshape2)
library(ggplot2)
s.all = cbind(sampledf1, f2=sampledf2$V1)
s.long = melt(s.all, id = 'f2')
ggplot(s.long) +
geom_boxplot(aes(x=f2, group=f2, y=value)) +
facet_wrap(~variable) +
scale_x_continuous(breaks=unique(s.long$f2))
Upvotes: 2
Reputation: 10671
library(purrr)
's walk
works nicely when you start trying to pass formulas like this. walk()
works like sapply
, iterating over the elements in an object, just with more flexible syntax. The .
refers to the iterated element from names(sampledf1)
.
This will work to get each panel named by the column in sampledf1
it represents:
library(purrr)
par(mfrow = c(2,3))
purrr::walk(names(sampledf1), ~boxplot(sampledf1[,.]~sampledf2$V1, main = .))
Upvotes: 1
Reputation: 605
ggplot2
variant:
library(reshape2)
library(ggplot2)
sampledf1$X <- sampledf2$V1
ggplot(melt(sampledf1, id.vars="X", na.rm=T), aes(factor(X),value)) +
geom_boxplot() + facet_wrap( ~ variable, nrow=2)
Upvotes: 0
Reputation: 10781
Here's a way using lapply
and seq_along
. We iterate through the columns of sampledf1
using seq_along
. We can extract the variable names using our index, i
, and the names
function.
par(mfrow = c(2,3))
lapply(seq_along(sampledf1),
FUN = function(i)
boxplot(sampledf1[,i] ~ sampledf2$V1, main = names(sampledf1)[i])
)
Upvotes: 3