Reputation: 797
Using dplyr
Here is my dataset:
Class Time Honors Grade Total Students
Math AM Yes PassFail 23
English AM No Letter 31
Science AM Yes Letter 22
Gym AM No PassFail 26
Math PM Yes PassFail 19
English PM No Letter 23
Science PM Yes Letter 24
Gym PM No PassFail 13
Math AM Yes PassFail 24
English AM Yes Letter 27
Science AM No Letter 28
Math PM No Letter 21
English PM Yes PassFail 23
Science PM No PassFail 22
I want to run four queries, with four increasingly specific answers. The first query will have one group_by argument, the second two group_by arguments, three for the third and so on.
#query 1
df %>%
group_by(Class) %>%
summarise(NewValue = mean(`Total Students`))
#results
Class NewValue
<chr> <dbl>
1 English 26.00
2 Gym 19.50
3 Math 21.75
4 Science 24.0
The second query is the same basic calculation with one more group_by argument.
#query2
df %>%
group_by(Class, Time) %>%
summarise(NewValue = mean(`Total Students`))
#results
Class Time NewValue
<chr> <chr> <dbl>
1 English AM 29.0
2 English PM 23.0
3 Gym AM 26.0
4 Gym PM 13.0
5 Math AM 23.5
6 Math PM 20.0
7 Science AM 25.0
8 Science PM 23.0
And the pattern continues #query3
will be
df %>%
group_by(Class, Time, Honors) %>%
summarise(NewValue = mean(`Total Students`))
And #query4
will be
df %>%
group_by(Class, Time, Honors, Grade) %>%
summarise(NewValue = mean(`Total Students`))
Question:
Is there a way to write one query and use a for loop to incorporate the increasing levels of detail in the group_by
argument?
For example, the pseudo code below does not work; I was hoping there is a solution similar:
resultsarray <- data.frame()
Groupbysteps <- c( "Class",
"Class, Time",
"Class, Time, Honors",
"Class, Time, Honors, Grade")
for (i in Groupbysteps) {
resultsarray <- df%>%
group_by( Groupbysteps) %>%
summarise(NewValue = mean(`Total Students`))
all <- rbind.fill(all, resultsarray)
}
Upvotes: 1
Views: 1166
Reputation: 269441
Try syms
in rlang as shown:
library(dplyr)
library(rlang)
L <- lapply(1:4, function(i) df %>%
group_by(!!!syms(names(df)[1:i])) %>%
summarize(newValue = mean(Total_Students))
)
giving a list L
of 4 data frames whose column names are:
> lapply(L, names)
[[1]]
[1] "Class" "newValue"
[[2]]
[1] "Class" "Time" "newValue"
[[3]]
[1] "Class" "Time" "Honors" "newValue"
[[4]]
[1] "Class" "Time" "Honors" "Grade" "newValue"
Upvotes: 1
Reputation: 6222
This could work.
Groupbysteps <- c( "Class", "Time", "Honors", "Grade")
for (i in 1 : length(Groupbysteps)) {
resultsarray <- df%>%
group_by(.dots = Groupbysteps[1 : i]) %>%
summarise(NewValue = mean(`Total Students`))
all <- rbind.fill(all, resultsarray)
}
Following works:
Example data set
df <- iris[1:20, ]
colnames(df) <- c( "Class", "Time", "Honors", "Grade", "Total Students")
df[, 1] <- as.factor(sample(c("a", "b"), rep=T))
df[, 2] <- as.factor(sample(c("a", "b"), rep=T))
df[, 3] <- as.factor(sample(c("a", "b"), rep=T))
df[, 4] <- as.factor(sample(c("a", "b"), rep=T))
df[, 5] <- rnorm(20)
The code:
Groupbysteps <- c( "Class", "Time", "Honors", "Grade")
resultsarray <- data.frame()
for (i in 1 : length(Groupbysteps)) {
resultsarray <- df %>%
group_by(.dots = Groupbysteps[1 : i]) %>%
summarise(NewValue = mean(`Total Students`))
all <- rbind.fill(all, resultsarray)
}
Upvotes: 1