Reputation: 52268
I've had some luck vectorizing certain functions, which is great for clean code, avoiding loops, and speed.
However, I have not been able to vectorize any function which subsets a dataframe based on the inputs to the function
E.g. This function works well when it receives elements
test_funct <- function(sep_wid, sep_len) {
iris %>% filter(Sepal.Width > sep_wid & Sepal.Length < sep_len) %>% .$Petal.Width %>% sum
}
test_funct(4, 6)
# [1] 0.7 # This works nicely
But when attempting to provide vectors as inputs to this function:
sep_wid_vector <- c(4, 3.5, 3)
sep_len_vector <- c(6, 6, 6.5)
test_funct(sep_wid_vector, sep_len_vector)
[1] 9.1
But the desired output is a vector of the same length as the input vectors, as though the function was run on the first elements of each vector, then the second, then the third. i.e.
# 0.7 4.2 28.5
For convenience, here output as if these were all run separately
test_funct(4, 6) # 0.7
test_funct(3.5, 6) # 4.2
test_funct(3, 6.5) # 28.5
How can I vectorize a function that subsets data based on its inputs so that it can receive vector inputs?
Upvotes: 6
Views: 351
Reputation: 511
The problem is that filter
takes vector inputs, so it will recycle the vectors in the Sepal.width
and Sepal.length
comparisons.
One way to do this would be to use map2
from the purrr
package:
map2_dbl(sep_wid_vector, sep_len_vector, test_funct)
Of course you could then wrap this in a function. You might also want to consider passing in the data frame as a function parameter.
Upvotes: 5
Reputation: 2022
Here is one way using sapply
# function using sapply
test_funct <- function(sep_wid, sep_len) {
sapply(seq_along(sep_wid), function(x) {
sum(iris$Petal.Width[iris$Sepal.Width > sep_wid[x] & iris$Sepal.Length < sep_len[x]])
})
}
# testing with single value
test_funct(4,6)
[1] 0.7
# testing with vectors
test_funct(sep_wid_vector, sep_len_vector)
[1] 0.7 4.2 28.5
Upvotes: 2
Reputation: 20329
You can use Vectorize
:
tv <- Vectorize(test_funct)
tv(sep_wid_vector, sep_len_vector)
# [1] 0.7 4.2 28.5
This is basically a wrapper around mapply
. Be aware that under the hood you are running an *apply
function, which is alos sort of a loop
Upvotes: 5