Reputation: 4284
I'm struggling to write a function that works inside dplyr::mutate()
.
Since rowwise() %>% sum()
is quite slow on large datasets, the suggested alternative is to return back to baseR. I'm hoping to streamline this process as below, but am having trouble passing the data within the mutate function.
require(tidyverse)
#> Loading required package: tidyverse
#I'd like to write a function that works inside mutate and replaces the rowSums(select()).
cars <- as_tibble(cars)
cars %>%
mutate(sum = rowSums(select(., speed, dist), na.rm = T))
#> # A tibble: 50 x 3
#> speed dist sum
#> <dbl> <dbl> <dbl>
#> 1 4. 2. 6.
#> 2 4. 10. 14.
#> 3 7. 4. 11.
#> 4 7. 22. 29.
#> 5 8. 16. 24.
#> 6 9. 10. 19.
#> 7 10. 18. 28.
#> 8 10. 26. 36.
#> 9 10. 34. 44.
#> 10 11. 17. 28.
#> # ... with 40 more rows
#Here is my first attempt.
rowwise_sum <- function(data, ..., na.rm = FALSE) {
columns <- rlang::enquos(...)
data %>%
select(!!!columns) %>%
rowSums(na.rm = na.rm)
}
#Doesnt' work as expected:
cars %>% mutate(sum = rowwise_sum(speed, dist, na.rm = T))
#> Error in mutate_impl(.data, dots): Evaluation error: no applicable method for 'select_' applied to an object of class "c('double', 'numeric')".
#But alone it is creating a vector.
cars %>% rowwise_sum(speed, dist, na.rm = T)
#> [1] 6 14 11 29 24 19 28 36 44 28 39 26 32 36 40 39 47
#> [18] 47 59 40 50 74 94 35 41 69 48 56 49 57 67 60 74 94
#> [35] 102 55 65 87 52 68 72 76 84 88 77 94 116 117 144 110
#Appears to not be getting the data passed. Specifying with a dot works.
cars %>% mutate(sum = rowwise_sum(., speed, dist, na.rm = T))
#> # A tibble: 50 x 3
#> speed dist sum
#> <dbl> <dbl> <dbl>
#> 1 4. 2. 6.
#> 2 4. 10. 14.
#> 3 7. 4. 11.
#> 4 7. 22. 29.
#> 5 8. 16. 24.
#> 6 9. 10. 19.
#> 7 10. 18. 28.
#> 8 10. 26. 36.
#> 9 10. 34. 44.
#> 10 11. 17. 28.
#> # ... with 40 more rows
So the question becomes how to get around this need of including a dot every time by instead passing the data inside the function?
rowwise_sum2 <- function(data, ..., na.rm = FALSE) {
columns <- rlang::enquos(...)
data %>%
select(!!!columns) %>%
rowSums(., na.rm = na.rm)
}
#Same error
cars %>% mutate(sum = rowwise_sum2(speed, dist, na.rm = T))
#> Error in mutate_impl(.data, dots): Evaluation error: no applicable method for 'select_' applied to an object of class "c('double', 'numeric')".
#Same result
cars %>% rowwise_sum2(speed, dist, na.rm = T)
#> [1] 6 14 11 29 24 19 28 36 44 28 39 26 32 36 40 39 47
#> [18] 47 59 40 50 74 94 35 41 69 48 56 49 57 67 60 74 94
#> [35] 102 55 65 87 52 68 72 76 84 88 77 94 116 117 144 110
#Same result
cars %>% mutate(sum = rowwise_sum2(., speed, dist, na.rm = T))
#> # A tibble: 50 x 3
#> speed dist sum
#> <dbl> <dbl> <dbl>
#> 1 4. 2. 6.
#> 2 4. 10. 14.
#> 3 7. 4. 11.
#> 4 7. 22. 29.
#> 5 8. 16. 24.
#> 6 9. 10. 19.
#> 7 10. 18. 28.
#> 8 10. 26. 36.
#> 9 10. 34. 44.
#> 10 11. 17. 28.
#> # ... with 40 more rows
Created on 2018-05-22 by the reprex package (v0.2.0).
Answer from akrun below (please upvote):
To paraphrase: just ditch the mutate()
and do everything in the new function.
Here is my final function as an update to his which also allows naming the sum value column if desired.
rowwise_sum <- function(data, ..., sum_col = "sum", na.rm = FALSE) {
columns <- rlang::enquos(...)
data %>%
select(!!! columns) %>%
transmute(!!sum_col := rowSums(., na.rm = na.rm)) %>%
bind_cols(data, .)
}
Upvotes: 5
Views: 2460
Reputation: 887038
We can place the ...
at the end
rowwise_sum <- function(data, na.rm = FALSE,...) {
columns <- rlang::enquos(...)
data %>%
select(!!!columns) %>%
rowSums(na.rm = na.rm)
}
cars %>%
mutate(sum = rowwise_sum(., na.rm = TRUE, speed, dist))
# A tibble: 50 x 3
# speed dist sum
# <dbl> <dbl> <dbl>
# 1 4 2 6
# 2 4 10 14
# 3 7 4 11
# 4 7 22 29
# 5 8 16 24
# 6 9 10 19
# 7 10 18 28
# 8 10 26 36
# 9 10 34 44
#10 11 17 28
# ... with 40 more rows
It would also work without changing the position of ...
(though in general it is recommended). Here the main issue is the data
(which is .
) is not specified in the argument list within in mutate
.
It would be easier to create the whole flow in the function instead of doing a part
rowwise_sum2 <- function(data, na.rm = FALSE, ...) {
columns <- rlang::enquos(...)
data %>%
select(!!! columns) %>%
transmute(sum = rowSums(., na.rm = TRUE)) %>%
bind_cols(data, .)
}
rowwise_sum2(cars, na.rm = TRUE, speed, dist)
# A tibble: 50 x 3
# speed dist sum
# <dbl> <dbl> <dbl>
# 1 4 2 6
# 2 4 10 14
# 3 7 4 11
# 4 7 22 29
# 5 8 16 24
# 6 9 10 19
# 7 10 18 28
# 8 10 26 36
# 9 10 34 44
#10 11 17 28
Upvotes: 4