Reputation: 1287
I need to group by variable x or variable y depending upon a condition. This is not happening when I use a magrittr pipe.
Consider a dataframe df1:
> df1
seat_id student_id seat_state
1 1222 500 9
2 850 500 9
3 850 500 9
4 1225 500 9
5 16502 500 9
6 17792 500 9
7 17792 500 9
8 1219 501 10
9 847 501 9
10 847 501 9
11 1220 501 9
12 17785 501 9
13 17785 501 9
14 1214 502 9
15 842 502 9
16 842 502 9
17 1215 502 9
18 1211 503 9
19 839 503 9
20 839 503 9
Now suppose I want to summarise this in two ways 1. By student_id or 2. By seat_state Depending upon a variable
summary
The old and long way is
if (summary==1) df1 %>% group_by(student_id) %>% summarise(seats=n()) else if (summary==2) df1 %>% group_by(seat_state) %>% summarise(seats=n())
But there has to be a more compact way especially because I have several magrittr pipes coming after the summarise statement and therefore will double the size of the code.
Upvotes: 3
Views: 6070
Reputation: 39154
In the latest version of dplyr
(0.7.1
). We can use quo
and unquote (!!
) to pass the grouping variable. Here is an example of a function using quo
from dplyr
. You can type vignette("programming")
to learn more about this.
# Load package
library(dplyr)
# Create a function
# This function has two arguments. The first one is the data frame
# The second one use to specify condition: 1 means group the student_id,
# while 2 means group the seat_state
my_summary <- function(df1, condition){
if (condition == 1){
group_var <- quo(student_id)
} else if (condition == 2){
group_var <- quo(seat_state)
}
df1 %>%
group_by(!!group_var) %>%
summarise(seats=n())
}
# Test the function
my_summary(df1, 1)
# A tibble: 4 x 2
student_id seats
<int> <int>
1 500 7
2 501 6
3 502 4
4 503 3
my_summary(df1, 2)
# A tibble: 2 x 2
seat_state seats
<int> <int>
1 9 19
2 10 1
Upvotes: 2
Reputation: 887118
We can replace the if/else
by subsetting the list
of quos
f1 <- function(df, cond) {
grp <- quos(student_id, seat_state)[[cond]]
df %>%
group_by(UQ(grp)) %>%
summarise(seats = n())
}
f1(df1, 1)
# A tibble: 4 x 2
# student_id seats
# <int> <int>
#1 500 7
#2 501 6
#3 502 4
#4 503 3
f1(df1, 2)
# A tibble: 2 x 2
# seat_state seats
# <int> <int>
#1 9 19
#2 10 1
Upvotes: 1
Reputation: 339
my_col <- 1 # the column number
df1 %>% group_by(.[,my_col]) %>% summarise(seats=n())
Upvotes: 0