Jon
Jon

Reputation: 67

How to pass variable to filter function within a R function

I am fairly new to R. I wrote the below function which tries to summarise a dataframe, based on a feature variable (passed to the function as 'variable') and a target variable (passed to the function as target_var). I also pass it a value (target_val) on which to filter.

The function below falls over on the filter line (filter(target_var == target_val)). I think it has something to do with quo, quosure etc, but can't figure out how to fix it. The following code should be ready to run - if you exclude the filter line it should work, if you included the filter line it will fall over.

library(dplyr)
target <- c('good', 'good', 'bad', 'good', 'good', 'bad')
var_1 <- c('debit_order', 'other', 'other', 'debit_order','debit_order','debit_order')

dset <- data.frame(target, var_1)
odds_by_var <- function(dataframe, variable, target_var, target_val){

  df_name <- paste('odds', deparse(substitute(variable)), sep = "_")
  variable_string <- deparse(substitute(variable))
  target_string <- deparse(substitute(target_var))

  temp_df1 <- dataframe %>%
    group_by_(variable_string, target_string) %>%
    summarise(cnt = n()) %>%
    group_by_(variable_string) %>%
    mutate(total = sum(cnt)) %>%
    mutate(rate = cnt / total) %>%
    filter(target_var == target_val) 

  assign(df_name, temp_df1, envir=.GlobalEnv)

}

odds_by_var(dset, var_1, target, 'bad')

Upvotes: 0

Views: 905

Answers (1)

heck1
heck1

Reputation: 726

so I assume you want to filter by target good or bad. In my understanding, always filter() before you group_by(), as you will possibly ommit your filter variables. I restructured your function a little:

    dset <- data.frame(target, var_1)
odds_by_var <- function(dataframe, variable, target_var, target_val){

  df_name <- paste('odds', deparse(substitute(variable)), sep = "_")
  variable_string <- deparse(substitute(variable))
  target_string <- deparse(substitute(target_var))

  temp_df1 <- dataframe %>%
    group_by_(variable_string, target_string) %>%
    summarise(cnt = n()) %>%
    mutate(total = sum(cnt),
           rate = cnt / total) 
names(temp_df1) <- c(variable_string,"target","cnt","total","rate" )
temp_df1 <- temp_df1[temp_df1$target == target_val,]
  assign( df_name,temp_df1, envir=.GlobalEnv)

}

odds_by_var(dset, var_1, target, "bad")

result:

> odds_var_1
# A tibble: 2 x 5
# Groups:   var_1 [2]
  var_1       target   cnt total  rate
  <chr>       <chr>  <int> <int> <dbl>
1 debit_order bad        1     4  0.25
2 other       bad        1     2  0.5 

Upvotes: 1

Related Questions