89_Simple
89_Simple

Reputation: 3805

dplyr select column using string and apply base function

Suppose a mathematical operation that I need to do is specified as a character vector

math.operation <- 'mean' # this could be mean, sum or length

I want to do apply this math.operation on a column whose name is also provided as a string in dplyr

my.column <- 'col1'
 
dat <- data.frame(id = rep(1:4, each = 4),
                  col1 = 1:16,
                  col2 = 16:1)

I first selected the column based on my.column and then added back my grouping variable which is id and then tried to do the operation by group

dat %>% dplyr::select(contains(my.column)) %>% 
dplyr::mutate(id = dat$id) %>%
dplyr::group_by(id) %>% 
dplyr::summarise(match.fun(math.operation)(my.column)) 

I am stuck in the last line which is producing NAs

Upvotes: 1

Views: 108

Answers (1)

TimTeaFan
TimTeaFan

Reputation: 18581

Option 1 You can use do.call with !! sym(). Note that I deleted your first select and mutate calls, since they seem to be redundant for this example.

Option 2 Instead of do.call you could use call, here you would not need to wrap the argument in list(), but then you would need to use eval, so the statement is not really shorter.

Option 3 A third option is to use your approach with match.fun and !! sym() which was missing in your example. However, I think do.call is more straightforward.

Option 4 Finally you could use eval(parse(...)), but the first way using do.call and !! sym() is preferable.

library(dplyr)

math.operation <- 'mean' # this could be mean, sum or length

my.column <- 'col1'

dat <- data.frame(id = rep(1:4, each = 4),
                  col1 = 1:16,
                  col2 = 16:1)
# Option 1
dat %>% 
  dplyr::group_by(id) %>% 
  dplyr::summarise(newvar = do.call(math.operation, list(!! sym(my.column))))
#> `summarise()` ungrouping output (override with `.groups` argument)
#> # A tibble: 4 x 2
#>      id newvar
#>   <int>  <dbl>
#> 1     1    2.5
#> 2     2    6.5
#> 3     3   10.5
#> 4     4   14.5

# Option 2
dat %>% 
  dplyr::group_by(id) %>%
  dplyr::summarise(newvar = eval(call(math.operation, !! sym(my.column))))
#> `summarise()` ungrouping output (override with `.groups` argument)
#> # A tibble: 4 x 2
#>      id newvar
#>   <int>  <dbl>
#> 1     1    2.5
#> 2     2    6.5
#> 3     3   10.5
#> 4     4   14.5

# Option 3
dat %>% 
  dplyr::group_by(id) %>%
  dplyr::summarise(newvar = match.fun(math.operation)(!! sym(my.column)))
#> `summarise()` ungrouping output (override with `.groups` argument)
#> # A tibble: 4 x 2
#>      id newvar
#>   <int>  <dbl>
#> 1     1    2.5
#> 2     2    6.5
#> 3     3   10.5
#> 4     4   14.5

# Option 4
dat %>% 
  dplyr::group_by(id) %>% 
  dplyr::summarise(newvar = eval(parse(text = paste0(math.operation, "(", my.column , ")"))))
#> `summarise()` ungrouping output (override with `.groups` argument)
#> # A tibble: 4 x 2
#>      id newvar
#>   <int>  <dbl>
#> 1     1    2.5
#> 2     2    6.5
#> 3     3   10.5
#> 4     4   14.5

Created on 2020-07-08 by the reprex package (v0.3.0)

Upvotes: 1

Related Questions