stevec
stevec

Reputation: 52268

How to vectorize a subsetting function in R?

I've had some luck vectorizing certain functions, which is great for clean code, avoiding loops, and speed.

However, I have not been able to vectorize any function which subsets a dataframe based on the inputs to the function

Example

E.g. This function works well when it receives elements

test_funct <- function(sep_wid, sep_len) {
    iris %>% filter(Sepal.Width > sep_wid & Sepal.Length < sep_len) %>% .$Petal.Width %>% sum
}

test_funct(4, 6)

# [1] 0.7 # This works nicely

But when attempting to provide vectors as inputs to this function:

sep_wid_vector <- c(4, 3.5, 3)
sep_len_vector <- c(6, 6, 6.5)


test_funct(sep_wid_vector, sep_len_vector)

[1] 9.1 

But the desired output is a vector of the same length as the input vectors, as though the function was run on the first elements of each vector, then the second, then the third. i.e.

# 0.7    4.2     28.5 

For convenience, here output as if these were all run separately

test_funct(4, 6) # 0.7
test_funct(3.5, 6) # 4.2
test_funct(3, 6.5) # 28.5

How can I vectorize a function that subsets data based on its inputs so that it can receive vector inputs?

Upvotes: 6

Views: 351

Answers (3)

Tarquinnn
Tarquinnn

Reputation: 511

The problem is that filter takes vector inputs, so it will recycle the vectors in the Sepal.width and Sepal.length comparisons.

One way to do this would be to use map2 from the purrr package:

map2_dbl(sep_wid_vector, sep_len_vector, test_funct)

Of course you could then wrap this in a function. You might also want to consider passing in the data frame as a function parameter.

Upvotes: 5

nsinghphd
nsinghphd

Reputation: 2022

Here is one way using sapply

# function using sapply
test_funct <- function(sep_wid, sep_len) {
  sapply(seq_along(sep_wid), function(x) {
    sum(iris$Petal.Width[iris$Sepal.Width > sep_wid[x] & iris$Sepal.Length < sep_len[x]])
  })
}

# testing with single value
test_funct(4,6)
[1] 0.7

# testing with vectors
test_funct(sep_wid_vector, sep_len_vector)
[1]  0.7  4.2 28.5

Upvotes: 2

thothal
thothal

Reputation: 20329

You can use Vectorize:

tv <- Vectorize(test_funct)

tv(sep_wid_vector, sep_len_vector)
# [1]  0.7  4.2 28.5

This is basically a wrapper around mapply. Be aware that under the hood you are running an *apply function, which is alos sort of a loop

Upvotes: 5

Related Questions