Mobeus Zoom
Mobeus Zoom

Reputation: 608

Apply ggplot2 across columns

I am working with a dataframe with many columns and would like to produce certain plots of the data using ggplot2, namely, boxplots, histograms, density plots. I would like to do this by writing a single function that applies across all attributes (columns), producing one boxplot (or histogram etc) and then storing that as a given element of a list into which all the boxplots will be chained, so I could later index it by number (or by column name) in order to return the plot for a given attribute.

The issue I have is that, if I try to apply across columns with something like apply(df,2,boxPlot), I have to define boxPlot as a function that takes just a vector x. And when I do so, the attribute/column name and index are no longer retained. So e.g. in the code for producing a boxplot, like

bp <- ggplot(df, aes(x=Group, y=Attr, fill=Group)) + 
  geom_boxplot() + 
  labs(title="Plot of length per dose", x="Group", y =paste(Attr)) + 
  theme_classic()

the function has no idea how to extract the info necessary for Attr from just vector x (as this is just the column data and doesn't carry the column name or index).

(Note the x-axis is a factor variable called 'Group', which has 6 levels A,B,C,D,E,F, within X.)

Can anyone help with a good way of automating this procedure? (Ideally it should work for all types of ggplots; the problem here seems to simply be how to refer to the attribute name, within the ggplot function, in a way that can be applied / automatically replicated across the columns.) A for-loop would be acceptable, I guess, but if there's a more efficient/better way to do it in R then I'd prefer that!

Edit: something like what would be achieved by the top answer to this question: apply box plots to multiple variables. Except that in that answer, with his code you would still need a for-loop to change the indices on y=y[2] in the ggplot code and get all the boxplots. He's also expanded-grid to include different ````x``` possibilities (I have only one, the Group factor), but it would be easy to simplify down if the looping problem could be handled.

I'd also prefer just base R if possible--dplyr if absolutely necessary.

Upvotes: 0

Views: 760

Answers (1)

IceCreamToucan
IceCreamToucan

Reputation: 28685

Here's an example of iterating over all columns of a data frame to produce a list of plots, while retaining the column name in the ggplot axis label

library(tidyverse)

plots <- 
  imap(select(mtcars, -cyl), ~ {
    ggplot(mtcars, aes(x = cyl, y = .x)) + 
      geom_point() +
      ylab(.y)
  })

plots$mpg

enter image description here

You can also do this without purrr and dplyr

to_plot <- setdiff(names(mtcars), 'cyl')

plots <- 
  Map(function(.x, .y) {
    ggplot(mtcars, aes(x = cyl, y = .x)) + 
      geom_point() +
      ylab(.y)
  }, mtcars[to_plot], to_plot)

plots$mpg

Upvotes: 3

Related Questions