Reputation: 2592
I have a function in R which takes a scalar and a vector as arguments, to perform some operation on them returning a single value.
Given a "series" of scalars (here, the vector mya
) and a "series" of vectors (here, the matrix myv
), how can I vectorize the call to myf
so that each element in mya
goes with the corresponding vector in myv
?
mya = 1:3
myv = matrix(1:30, 10, 3)
myf = function(a, v) {
return(sum(a / (a/v + 1)))
}
sapply(1:3, function(x) {myf(mya[x], myv[,x])})
# [1] 7.980123 17.649590 26.809440
So above I would like to avoid the looping sapply
operation to do directly something like:
myf(mya, myv)
# [1] 49.37443 <- Here I would like 3 values
The big issue here is performance: in my real situation, mya
and myv
would have more than 10e6 values or vectors respectively, and myf
is much more complex.
Upvotes: 1
Views: 249
Reputation: 160447
Up front, your myv
might be organized as a series of vectors, one column each; it is better for many tools to convert it into a list
of vectors.
asplit(myv, 2)
# [[1]]
# [1] 1 2 3 4 5 6 7 8 9 10
# [[2]]
# [1] 11 12 13 14 15 16 17 18 19 20
# [[3]]
# [1] 21 22 23 24 25 26 27 28 29 30
sapply
/lapply
are to a single vector/list as mapply
/Map
are to n
of them.
Map(myf, mya, asplit(myv , 2))
# [[1]]
# [1] 7.980123
# [[2]]
# [1] 17.64959
# [[3]]
# [1] 26.80944
mapply(myf, mya, asplit(myv , 2))
# [1] 7.980123 17.649590 26.809440
The order of arguments is different, and instead of individual arguments it needs all of them in a list
itself.
purrr::pmap(list(mya, asplit(myv , 2)), myf)
# [[1]]
# [1] 7.980123
# [[2]]
# [1] 17.64959
# [[3]]
# [1] 26.80944
purrr::pmap_dbl(list(mya, asplit(myv , 2)), myf)
# [1] 7.980123 17.649590 26.809440
Alternative approach, given the comments.
This approach truly is vectorized, but has deconstructed the function a little.
colSums(t(mya / (mya / t(myv) + 1)))
# [1] 7.980123 17.649590 26.809440
To get to this point, one needs to recognize where t
ranspose and such is necessary. I'll start with some known points:
mya[1] / myv[,1] + 1
# [1] 2.000000 1.500000 1.333333 1.250000 1.200000 1.166667 1.142857 1.125000 1.111111 1.100000
In order to mimic that with matrices (and not just vectors), we might try
(mya / myv + 1)
# [,1] [,2] [,3]
# [1,] 2.000000 1.181818 1.142857
# [2,] 2.000000 1.250000 1.045455
# [3,] 2.000000 1.076923 1.086957
# [4,] 1.250000 1.142857 1.125000
# [5,] 1.400000 1.200000 1.040000
# [6,] 1.500000 1.062500 1.076923
# [7,] 1.142857 1.117647 1.111111
# [8,] 1.250000 1.166667 1.035714
# [9,] 1.333333 1.052632 1.068966
# [10,] 1.100000 1.100000 1.100000
But if you notice, the division of mya
over myv
is column-wise, so it is expanding to
c(mya[1] / myv[1,1], mya[2] / myv[2,1], mya[3] / myv[3,1], mya[1] / myv[4,1], ...)
where we would prefer it to be transposed. Okay, so we transpose it so that the rows of myv
are vertical for the division.
(mya / t(myv) + 1)[1,]
# [1] 2.000000 1.500000 1.333333 1.250000 1.200000 1.166667 1.142857 1.125000 1.111111 1.100000
That's better. Now we need to do the same for the next step. That brings us to
t(mya / (mya / t(myv) + 1))
# [,1] [,2] [,3]
# [1,] 0.5000000 1.692308 2.625000
# [2,] 0.6666667 1.714286 2.640000
# [3,] 0.7500000 1.733333 2.653846
# [4,] 0.8000000 1.750000 2.666667
# [5,] 0.8333333 1.764706 2.678571
# [6,] 0.8571429 1.777778 2.689655
# [7,] 0.8750000 1.789474 2.700000
# [8,] 0.8888889 1.800000 2.709677
# [9,] 0.9000000 1.809524 2.718750
# [10,] 0.9090909 1.818182 2.727273
Since you wanted to sum across each of the mya
values. Knowing that we have three in mya
and we see three columns, one might infer we need to sum each column. We can prove that empirically:
sum(mya[1] / (mya[1] / myv[,1] + 1))
# [1] 7.980123
colSums(t(mya / (mya / t(myv) + 1)))
# [1] 7.980123 17.649590 26.809440
But really, we don't need to t
ranpose then sum columns when we can not-transpose and sum the rows :-)
rowSums(mya / (mya / t(myv) + 1))
# [1] 7.980123 17.649590 26.809440
Upvotes: 3