Peter
Peter

Reputation: 373

R - how to filter data with a list of arguments to produce multiple data frames and graphs

I am looking for a way to use a list of filter arguments to produce different objects. I have a data set for which I want to make several graphs. However, I want all these graphs based on subsets of the dataset. For illustrative purposes I have made the following data.

df <- data.frame(type = c("b1", "b2", "b1", "b2"),
                 yield = c("15", "10", "5", "0"),
                 temperature = c("2", "21", "26", "13"),
                 Season = c("Winter", "Summer", "Summer", "Autumn"),
                 profit = c(TRUE, TRUE, FALSE, FALSE))

Also, I have a list of filter arguments.

filters <- c("brand=='b1'",
             "profit",
             "Season=='Summer'",
             "profit==FALSE",
             "yield >= 10",
             "")

What I would want is that I could use a for loop to have all these filters produce objects with the filtered data, and subsequently plot graphs. I have tried this in the following way.

for(i in 1:length(filters)){
  assign(paste0("df", i), filter(df, factor(filters[i])))
  assign(paste0("plot", i), ggplot(database, aes(x = temperature, y = yield)) + geom_point())
}

However, this did not work because the filter() function does not accept <fct> as an argument, nor <chr> (e.g., "brand=='b1'"). What I would want is brand=='b1', so filter() accepts it as an argument. Does anybody have an idea to do this?

Also, as an additional question, I would like to automate the whole process and end with an combined graph, so grid.arrange() at the end. Of course I could automate the ncol and nrow with some devision of length(filters). But how to I get all the produced plots in the grid.arrange()? This should probably be outside the for loop, right? Any ideas here?

Upvotes: 1

Views: 4883

Answers (3)

G. Grothendieck
G. Grothendieck

Reputation: 269852

Assume the input data in the Note at the end which fixes up some inconsistencies in the data shown in the question, makes temperature and yield numeric and improves profit == FALSE to just !profit. Define a function Plot which takes a filter, subsets df and plots it. Then apply it to each filter and use grid.arrange. This uses ggplot2 and gridExtra but no additional packages and does not use eval explicitly.

(An alternative to the grid.arrange line would be cowplot::plot_grid(plotlist=plots) which gives a slightly different layout.)

library(ggplot2)
library(gridExtra)

Plot <- function(x) {
  data <- do.call("subset", list(df, parse(text = x)))
  ggplot(data, aes(temperature, yield)) + geom_line() + geom_point() + ggtitle(x)
}

plots <- Map(Plot, filters)
do.call("grid.arrange", plots)

screenshot

Note

df <- data.frame(brand = c("b1", "b2", "b1", "b2"),
                 yield = c(15, 10, 5, 0),
                 temperature = c(2, 21, 26, 13),
                 Season = c("Winter", "Summer", "Summer", "Autumn"),
                 profit = c(TRUE, TRUE, FALSE, FALSE))

filters <- c("brand=='b1'",
             "profit",
             "Season=='Summer'",
             "!profit",
             "yield >= 10",
             TRUE)

Upvotes: 1

Edo
Edo

Reputation: 7858

You can do it by using eval and parse.

Also, a lapply over a custom function sounds more reasonable than a for loop with assign. The result is a list of ggplot objects.

To set all charts all together grid.arrange from the gridExtra package works fine. You just need to assign the list of your charts to the argument called grobs.

library(dplyr)
library(ggplot2)

df <- data.frame(type = c("b1", "b2", "b1", "b2"),
                 yield = c(15, 10, 5, 0),
                 temperature = c("2", "21", "26", "13"),
                 Season = c("Winter", "Summer", "Summer", "Autumn"),
                 profit = c(TRUE, TRUE, FALSE, FALSE))

filters <- list("type=='b1'",
                "profit",
                "Season=='Summer'",
                "profit==FALSE",
                "yield >= 10",
                "TRUE")


myfun <- function(fltr, df){

  df <- filter(df, eval(parse(text = fltr)))
  ggplot(df, aes(x = temperature, y = yield)) + geom_point()

}


ggs <- lapply(filters, myfun, df = df)

gridExtra::grid.arrange(grobs = ggs)

enter image description here

I made a couple of changes in your data: yield must be a numeric since you are using a filter applicable only to numeric vectors and the last filter (which was empty) is now equal to "TRUE" [I supposed you wanted to take everything in consideration]

Upvotes: 1

MrFlick
MrFlick

Reputation: 206401

Rather than storing your filters, as character strings, it would be better to store them a quosures. For example

library(rlang)
filters <- quos(type=='b1',
             profit,
             Season=='Summer',
             profit==FALSE,
             yield >= 10,
             TRUE)

Then you can fairly easily map over these with purrr::map

library(dplyr)
library(purrr)
library(ggplot2)
map(filters, ~df %>% filter(!!!.x) %>% 
      ggplot(aes(x = temperature, y = yield)) + geom_point())

Upvotes: 3

Related Questions