Jazzmine
Jazzmine

Reputation: 1875

Create R boxplots of dataframe with another variable

I'm looking to create a set of box plots where I create a bloxplot for each variable in sampledf1 against the single variable in sampledf2.

The actual use case is I've created a set of clusters with k-means and now want to see their distribution for each of the found clusters with each variable in the dataframe I'm using for clustering.

sampledf1 <- as.data.frame(replicate(6, sample(c(1:10,NA))))
sampledf2 <- as.data.frame(replicate(1, sample(c(21:30,NA))))

Then I want to see a box plot with each of the variables in sampledf1 paired with the only variable in sampledf2.

I would like to use something like:

sapply(boxplot(sampledf1~sampledf2$V1))

but this gives me this error:

Error in match.fun(FUN) : argument "FUN" is missing, with no default

Anyway I could do this would dplyr would be great but I didn't see any functions that I could chain together to do this.

Upvotes: 0

Views: 582

Answers (4)

dww
dww

Reputation: 31454

You can use ggplot and facets, if you first reshape your data into long format

library(reshape2)
library(ggplot2)
s.all = cbind(sampledf1, f2=sampledf2$V1)
s.long = melt(s.all, id = 'f2')
ggplot(s.long) +
  geom_boxplot(aes(x=f2, group=f2, y=value)) +
  facet_wrap(~variable) +
  scale_x_continuous(breaks=unique(s.long$f2))

enter image description here

Upvotes: 2

Nate
Nate

Reputation: 10671

library(purrr)'s walk works nicely when you start trying to pass formulas like this. walk() works like sapply, iterating over the elements in an object, just with more flexible syntax. The . refers to the iterated element from names(sampledf1).

This will work to get each panel named by the column in sampledf1 it represents:

library(purrr)    
par(mfrow = c(2,3))
purrr::walk(names(sampledf1), ~boxplot(sampledf1[,.]~sampledf2$V1, main = .))

enter image description here

Upvotes: 1

Thales
Thales

Reputation: 605

ggplot2 variant:

library(reshape2)
library(ggplot2)

sampledf1$X <- sampledf2$V1
ggplot(melt(sampledf1, id.vars="X", na.rm=T), aes(factor(X),value)) + 
  geom_boxplot() + facet_wrap( ~ variable, nrow=2)

enter image description here

Upvotes: 0

bouncyball
bouncyball

Reputation: 10781

Here's a way using lapply and seq_along. We iterate through the columns of sampledf1 using seq_along. We can extract the variable names using our index, i, and the names function.

par(mfrow = c(2,3))
lapply(seq_along(sampledf1), 
       FUN  = function(i) 
           boxplot(sampledf1[,i] ~ sampledf2$V1, main = names(sampledf1)[i])
       )

enter image description here

Upvotes: 3

Related Questions