jzadra
jzadra

Reputation: 4284

Writing a custom function that works inside dplyr::mutate()

I'm struggling to write a function that works inside dplyr::mutate().

Since rowwise() %>% sum() is quite slow on large datasets, the suggested alternative is to return back to baseR. I'm hoping to streamline this process as below, but am having trouble passing the data within the mutate function.

require(tidyverse)
#> Loading required package: tidyverse
#I'd like to write a function that works inside mutate and replaces the rowSums(select()).
cars <- as_tibble(cars)

cars %>% 
  mutate(sum = rowSums(select(., speed, dist), na.rm = T))
#> # A tibble: 50 x 3
#>    speed  dist   sum
#>    <dbl> <dbl> <dbl>
#>  1    4.    2.    6.
#>  2    4.   10.   14.
#>  3    7.    4.   11.
#>  4    7.   22.   29.
#>  5    8.   16.   24.
#>  6    9.   10.   19.
#>  7   10.   18.   28.
#>  8   10.   26.   36.
#>  9   10.   34.   44.
#> 10   11.   17.   28.
#> # ... with 40 more rows

#Here is my first attempt.
rowwise_sum <- function(data, ..., na.rm = FALSE) {
  columns <- rlang::enquos(...)

  data %>% 
    select(!!!columns) %>% 
    rowSums(na.rm = na.rm)
}

#Doesnt' work as expected:
cars %>% mutate(sum = rowwise_sum(speed, dist, na.rm = T))
#> Error in mutate_impl(.data, dots): Evaluation error: no applicable method for 'select_' applied to an object of class "c('double', 'numeric')".

#But alone it is creating a vector.
cars %>% rowwise_sum(speed, dist, na.rm = T)
#>  [1]   6  14  11  29  24  19  28  36  44  28  39  26  32  36  40  39  47
#> [18]  47  59  40  50  74  94  35  41  69  48  56  49  57  67  60  74  94
#> [35] 102  55  65  87  52  68  72  76  84  88  77  94 116 117 144 110

#Appears to not be getting the data passed.  Specifying with a dot works.
cars %>% mutate(sum = rowwise_sum(., speed, dist, na.rm = T))
#> # A tibble: 50 x 3
#>    speed  dist   sum
#>    <dbl> <dbl> <dbl>
#>  1    4.    2.    6.
#>  2    4.   10.   14.
#>  3    7.    4.   11.
#>  4    7.   22.   29.
#>  5    8.   16.   24.
#>  6    9.   10.   19.
#>  7   10.   18.   28.
#>  8   10.   26.   36.
#>  9   10.   34.   44.
#> 10   11.   17.   28.
#> # ... with 40 more rows

So the question becomes how to get around this need of including a dot every time by instead passing the data inside the function?

rowwise_sum2 <- function(data, ..., na.rm = FALSE) {
  columns <- rlang::enquos(...)

  data %>% 
    select(!!!columns) %>% 
    rowSums(., na.rm = na.rm)
}

#Same error
cars %>% mutate(sum = rowwise_sum2(speed, dist, na.rm = T))
#> Error in mutate_impl(.data, dots): Evaluation error: no applicable method for 'select_' applied to an object of class "c('double', 'numeric')".

#Same result
cars %>% rowwise_sum2(speed, dist, na.rm = T)
#>  [1]   6  14  11  29  24  19  28  36  44  28  39  26  32  36  40  39  47
#> [18]  47  59  40  50  74  94  35  41  69  48  56  49  57  67  60  74  94
#> [35] 102  55  65  87  52  68  72  76  84  88  77  94 116 117 144 110

#Same result
cars %>% mutate(sum = rowwise_sum2(., speed, dist, na.rm = T))
#> # A tibble: 50 x 3
#>    speed  dist   sum
#>    <dbl> <dbl> <dbl>
#>  1    4.    2.    6.
#>  2    4.   10.   14.
#>  3    7.    4.   11.
#>  4    7.   22.   29.
#>  5    8.   16.   24.
#>  6    9.   10.   19.
#>  7   10.   18.   28.
#>  8   10.   26.   36.
#>  9   10.   34.   44.
#> 10   11.   17.   28.
#> # ... with 40 more rows

Created on 2018-05-22 by the reprex package (v0.2.0).


Answer from akrun below (please upvote):

To paraphrase: just ditch the mutate() and do everything in the new function.

Here is my final function as an update to his which also allows naming the sum value column if desired.

rowwise_sum <- function(data, ..., sum_col = "sum", na.rm = FALSE) {

  columns <- rlang::enquos(...)

  data %>%
    select(!!! columns) %>%
    transmute(!!sum_col := rowSums(., na.rm = na.rm)) %>%
    bind_cols(data, .)
}

Upvotes: 5

Views: 2460

Answers (1)

akrun
akrun

Reputation: 887038

We can place the ... at the end

rowwise_sum <- function(data, na.rm = FALSE,...) {
  columns <- rlang::enquos(...)
  data %>%
     select(!!!columns) %>%
     rowSums(na.rm = na.rm)
}

cars %>% 
     mutate(sum = rowwise_sum(., na.rm = TRUE, speed, dist))
# A tibble: 50 x 3
#   speed  dist   sum
#   <dbl> <dbl> <dbl>
# 1     4     2     6
# 2     4    10    14
# 3     7     4    11
# 4     7    22    29
# 5     8    16    24
# 6     9    10    19
# 7    10    18    28
# 8    10    26    36
# 9    10    34    44
#10    11    17    28
# ... with 40 more rows

It would also work without changing the position of ... (though in general it is recommended). Here the main issue is the data (which is .) is not specified in the argument list within in mutate.


It would be easier to create the whole flow in the function instead of doing a part

rowwise_sum2 <- function(data, na.rm = FALSE, ...) {
  columns <- rlang::enquos(...)
  data %>%
      select(!!! columns) %>%
      transmute(sum = rowSums(., na.rm = TRUE)) %>%
      bind_cols(data, .)

}

rowwise_sum2(cars, na.rm = TRUE, speed, dist)
# A tibble: 50 x 3
#   speed  dist   sum
#   <dbl> <dbl> <dbl>
# 1     4     2     6
# 2     4    10    14
# 3     7     4    11
# 4     7    22    29
# 5     8    16    24
# 6     9    10    19
# 7    10    18    28
# 8    10    26    36
# 9    10    34    44
#10    11    17    28

Upvotes: 4

Related Questions