jpf5046
jpf5046

Reputation: 797

Dynamic group_by argument based on a list, R, for loop/function

Using dplyr

Here is my dataset:

Class   Time    Honors  Grade    Total Students
Math    AM      Yes     PassFail    23
English AM      No      Letter      31
Science AM      Yes     Letter      22
Gym     AM      No      PassFail    26
Math    PM      Yes     PassFail    19
English PM      No      Letter      23
Science PM      Yes     Letter      24
Gym     PM      No      PassFail    13
Math    AM      Yes     PassFail    24
English AM      Yes     Letter      27
Science AM      No      Letter      28
Math    PM      No      Letter      21
English PM      Yes     PassFail    23
Science PM      No      PassFail    22

I want to run four queries, with four increasingly specific answers. The first query will have one group_by argument, the second two group_by arguments, three for the third and so on.

#query 1 
df %>%
  group_by(Class) %>%
  summarise(NewValue = mean(`Total Students`))

#results
    Class NewValue
    <chr>    <dbl>
1 English    26.00
2     Gym    19.50
3    Math    21.75
4 Science    24.0

The second query is the same basic calculation with one more group_by argument.

#query2
df %>%
  group_by(Class, Time) %>%
  summarise(NewValue = mean(`Total Students`))

#results
    Class  Time NewValue
    <chr> <chr>    <dbl>
1 English    AM     29.0
2 English    PM     23.0
3     Gym    AM     26.0
4     Gym    PM     13.0
5    Math    AM     23.5
6    Math    PM     20.0
7 Science    AM     25.0
8 Science    PM     23.0

And the pattern continues #query3 will be

 df %>%
  group_by(Class, Time, Honors) %>%
  summarise(NewValue = mean(`Total Students`))

And #query4 will be

df %>%
  group_by(Class, Time, Honors, Grade) %>%
  summarise(NewValue = mean(`Total Students`))

Question:

Is there a way to write one query and use a for loop to incorporate the increasing levels of detail in the group_by argument?

For example, the pseudo code below does not work; I was hoping there is a solution similar:

resultsarray <- data.frame()
Groupbysteps <- c( "Class", 
                   "Class, Time", 
                   "Class, Time, Honors", 
                   "Class, Time, Honors, Grade")

for (i in Groupbysteps) {
      resultsarray <- df%>%
                       group_by( Groupbysteps) %>%
                       summarise(NewValue = mean(`Total Students`))

 all <- rbind.fill(all, resultsarray)
}

Upvotes: 1

Views: 1166

Answers (2)

G. Grothendieck
G. Grothendieck

Reputation: 269441

Try syms in rlang as shown:

library(dplyr)
library(rlang)

L <- lapply(1:4, function(i) df %>% 
                               group_by(!!!syms(names(df)[1:i])) %>% 
                               summarize(newValue = mean(Total_Students))
)

giving a list L of 4 data frames whose column names are:

> lapply(L, names)
[[1]]
[1] "Class"    "newValue"

[[2]]
[1] "Class"    "Time"     "newValue"

[[3]]
[1] "Class"    "Time"     "Honors"   "newValue"

[[4]]
[1] "Class"    "Time"     "Honors"   "Grade"    "newValue"

Upvotes: 1

kangaroo_cliff
kangaroo_cliff

Reputation: 6222

This could work.

Groupbysteps <- c( "Class", "Time", "Honors", "Grade")

for (i in 1 : length(Groupbysteps)) {
      resultsarray <- df%>%
                       group_by(.dots = Groupbysteps[1 : i]) %>%
                       summarise(NewValue = mean(`Total Students`))

 all <- rbind.fill(all, resultsarray)
}

Following works:

Example data set

df <- iris[1:20, ]
colnames(df) <- c( "Class", "Time", "Honors", "Grade", "Total Students")
df[, 1] <- as.factor(sample(c("a", "b"), rep=T)) 
df[, 2] <- as.factor(sample(c("a", "b"), rep=T)) 
df[, 3] <- as.factor(sample(c("a", "b"), rep=T)) 
df[, 4] <- as.factor(sample(c("a", "b"), rep=T)) 
df[, 5] <- rnorm(20)

The code:

Groupbysteps <- c( "Class", "Time", "Honors", "Grade")
resultsarray <- data.frame()

for (i in 1 : length(Groupbysteps)) {

  resultsarray <- df %>%
    group_by(.dots = Groupbysteps[1 : i]) %>%
    summarise(NewValue = mean(`Total Students`))

  all <- rbind.fill(all, resultsarray)
}

Upvotes: 1

Related Questions