Lazarus Thurston
Lazarus Thurston

Reputation: 1287

how to use group_by with a condition like if-then-else and apply dplyr philosophy

I need to group by variable x or variable y depending upon a condition. This is not happening when I use a magrittr pipe.

Consider a dataframe df1:

> df1


   seat_id student_id seat_state
1     1222        500          9
2      850        500          9
3      850        500          9
4     1225        500          9
5    16502        500          9
6    17792        500          9
7    17792        500          9
8     1219        501         10
9      847        501          9
10     847        501          9
11    1220        501          9
12   17785        501          9
13   17785        501          9
14    1214        502          9
15     842        502          9
16     842        502          9
17    1215        502          9
18    1211        503          9
19     839        503          9
20     839        503          9

Now suppose I want to summarise this in two ways 1. By student_id or 2. By seat_state Depending upon a variable

summary

The old and long way is

if (summary==1) df1 %>% group_by(student_id) %>% summarise(seats=n()) else if (summary==2) df1 %>% group_by(seat_state) %>% summarise(seats=n())

But there has to be a more compact way especially because I have several magrittr pipes coming after the summarise statement and therefore will double the size of the code.

Upvotes: 3

Views: 6070

Answers (3)

www
www

Reputation: 39154

In the latest version of dplyr (0.7.1). We can use quo and unquote (!!) to pass the grouping variable. Here is an example of a function using quo from dplyr. You can type vignette("programming") to learn more about this.

# Load package
library(dplyr)

# Create a function
# This function has two arguments. The first one is the data frame
# The second one use to specify condition: 1 means group the student_id, 
# while 2 means group the seat_state 
my_summary <- function(df1, condition){

  if (condition == 1){
    group_var <- quo(student_id)
  } else if (condition == 2){
    group_var <- quo(seat_state)
  }
  df1 %>%
    group_by(!!group_var) %>%
    summarise(seats=n())
}

# Test the function
my_summary(df1, 1)

# A tibble: 4 x 2
  student_id seats
       <int> <int>
1        500     7
2        501     6
3        502     4
4        503     3

my_summary(df1, 2)
# A tibble: 2 x 2
  seat_state seats
       <int> <int>
1          9    19
2         10     1

Upvotes: 2

akrun
akrun

Reputation: 887118

We can replace the if/else by subsetting the list of quos

f1 <- function(df, cond) {
    grp <- quos(student_id, seat_state)[[cond]]      
    df %>%
        group_by(UQ(grp)) %>%
        summarise(seats = n())
}

f1(df1, 1)
# A tibble: 4 x 2
#  student_id seats
#       <int> <int>
#1        500     7
#2        501     6
#3        502     4
#4        503     3

f1(df1, 2)
# A tibble: 2 x 2
#  seat_state seats
#       <int> <int>
#1          9    19
#2         10     1

Upvotes: 1

Kevin
Kevin

Reputation: 339

my_col <- 1 # the column number
df1 %>% group_by(.[,my_col]) %>% summarise(seats=n())

Upvotes: 0

Related Questions