ztl
ztl

Reputation: 2592

How to vectorize an operation on a "series" of vectors in R

I have a function in R which takes a scalar and a vector as arguments, to perform some operation on them returning a single value.

Given a "series" of scalars (here, the vector mya) and a "series" of vectors (here, the matrix myv), how can I vectorize the call to myf so that each element in mya goes with the corresponding vector in myv?

mya = 1:3
myv = matrix(1:30, 10, 3)

myf = function(a, v) {
  return(sum(a / (a/v + 1)))
}

sapply(1:3, function(x) {myf(mya[x], myv[,x])})
# [1]  7.980123 17.649590 26.809440

So above I would like to avoid the looping sapply operation to do directly something like:

myf(mya, myv)
# [1] 49.37443   <- Here I would like 3 values

The big issue here is performance: in my real situation, mya and myv would have more than 10e6 values or vectors respectively, and myf is much more complex.

Upvotes: 1

Views: 249

Answers (1)

r2evans
r2evans

Reputation: 160447

Up front, your myv might be organized as a series of vectors, one column each; it is better for many tools to convert it into a list of vectors.

asplit(myv, 2)
# [[1]]
#  [1]  1  2  3  4  5  6  7  8  9 10
# [[2]]
#  [1] 11 12 13 14 15 16 17 18 19 20
# [[3]]
#  [1] 21 22 23 24 25 26 27 28 29 30

base R

sapply/lapply are to a single vector/list as mapply/Map are to n of them.

Map(myf, mya, asplit(myv , 2))
# [[1]]
# [1] 7.980123
# [[2]]
# [1] 17.64959
# [[3]]
# [1] 26.80944
mapply(myf, mya, asplit(myv , 2))
# [1]  7.980123 17.649590 26.809440

tidyverse

The order of arguments is different, and instead of individual arguments it needs all of them in a list itself.

purrr::pmap(list(mya, asplit(myv , 2)), myf)
# [[1]]
# [1] 7.980123
# [[2]]
# [1] 17.64959
# [[3]]
# [1] 26.80944
purrr::pmap_dbl(list(mya, asplit(myv , 2)), myf)
# [1]  7.980123 17.649590 26.809440

Alternative approach, given the comments.

This approach truly is vectorized, but has deconstructed the function a little.

colSums(t(mya / (mya / t(myv) + 1)))
# [1]  7.980123 17.649590 26.809440

To get to this point, one needs to recognize where transpose and such is necessary. I'll start with some known points:

mya[1] / myv[,1] + 1
#  [1] 2.000000 1.500000 1.333333 1.250000 1.200000 1.166667 1.142857 1.125000 1.111111 1.100000

In order to mimic that with matrices (and not just vectors), we might try

(mya / myv + 1)
#           [,1]     [,2]     [,3]
#  [1,] 2.000000 1.181818 1.142857
#  [2,] 2.000000 1.250000 1.045455
#  [3,] 2.000000 1.076923 1.086957
#  [4,] 1.250000 1.142857 1.125000
#  [5,] 1.400000 1.200000 1.040000
#  [6,] 1.500000 1.062500 1.076923
#  [7,] 1.142857 1.117647 1.111111
#  [8,] 1.250000 1.166667 1.035714
#  [9,] 1.333333 1.052632 1.068966
# [10,] 1.100000 1.100000 1.100000

But if you notice, the division of mya over myv is column-wise, so it is expanding to

c(mya[1] / myv[1,1], mya[2] / myv[2,1], mya[3] / myv[3,1], mya[1] / myv[4,1], ...)

where we would prefer it to be transposed. Okay, so we transpose it so that the rows of myv are vertical for the division.

(mya / t(myv) + 1)[1,]
#  [1] 2.000000 1.500000 1.333333 1.250000 1.200000 1.166667 1.142857 1.125000 1.111111 1.100000

That's better. Now we need to do the same for the next step. That brings us to

t(mya / (mya / t(myv) + 1))
#            [,1]     [,2]     [,3]
#  [1,] 0.5000000 1.692308 2.625000
#  [2,] 0.6666667 1.714286 2.640000
#  [3,] 0.7500000 1.733333 2.653846
#  [4,] 0.8000000 1.750000 2.666667
#  [5,] 0.8333333 1.764706 2.678571
#  [6,] 0.8571429 1.777778 2.689655
#  [7,] 0.8750000 1.789474 2.700000
#  [8,] 0.8888889 1.800000 2.709677
#  [9,] 0.9000000 1.809524 2.718750
# [10,] 0.9090909 1.818182 2.727273

Since you wanted to sum across each of the mya values. Knowing that we have three in mya and we see three columns, one might infer we need to sum each column. We can prove that empirically:

sum(mya[1] / (mya[1] / myv[,1] + 1))
# [1] 7.980123
colSums(t(mya / (mya / t(myv) + 1)))
# [1]  7.980123 17.649590 26.809440

But really, we don't need to tranpose then sum columns when we can not-transpose and sum the rows :-)

rowSums(mya / (mya / t(myv) + 1))
# [1]  7.980123 17.649590 26.809440

Upvotes: 3

Related Questions