Reputation: 33
I would like to vectorize the following code in for more efficient processing. I need to take the product of columns by row (i.e. rowProds), but the number of columns I would like the product of needs to be a function of another input.
If possible I would prefer this be done using Base R, but I'm open to and appreciate any suggestions.
This can easily be done using a loop or apply family with a udf, but these are not fast enough to meet my needs.
# Generate some data
mat <- data.frame(X = 1:5)
for (i in 1:5) {
set.seed(i)
mat[1 + i] <- runif(5)
}
# Via a for loop
for (i in 1:nrow(mat)) {
mat$calc[i] <- prod(mat[match(mat$X[i], mat$X), 2:(i + 1)])
}
mat
# Via a function with mapply
rowprodfun <- function(X) {
myprod <- prod(mat[match(X, mat$X), 2:(X + 1)])
return(myprod)
}
mat$calc <- mapply(rowprodfun, mat$X)
mat
mat$calc
# [1] 0.265508663 0.261370165 0.126427355 0.013874517 0.009758232
Both methods above result in the same "calc" column. I just need a more efficient way to generate this column.
Upvotes: 3
Views: 117
Reputation: 33
The use of upper.tri
suggested by @akrun was super helpful. The final piece was to convert the data.frame
to a matrix as.matrix
before doing the element-wise multiplication.
rowProds(as.matrix(mat[-1]) * NA ^ upper.tri(mat[-1]), na.rm = T)
resulted in the most efficient calculation.
apply(as.matrix(mat[-1]) * NA ^ upper.tri(mat[-1]), 1, prod, na.rm = T)
was nearly as efficient if trying to accomplish in base R.
library(microbenchmark)
library(matrixStats)
library(ggplot2)
Y <- microbenchmark(
for.loop = for (i in 1:nrow(mat)) {prod(mat[match(mat$X[i], mat$X), 2:(i + 1)])},
mapply.fun = mapply(rowprodfun, mat$X),
rowProds = rowProds(as.matrix(mat[-1] * NA ^ upper.tri(mat[-1])), na.rm = T),
rowProds.matrix = rowProds(as.matrix(mat[-1]) * NA ^ upper.tri(mat[-1]), na.rm = T),
apply = apply(mat[-1] * NA ^ upper.tri(mat[-1]), 1, prod, na.rm = T),
apply.matrix = apply(as.matrix(mat[-1]) * NA ^ upper.tri(mat[-1]), 1, prod, na.rm = T)
)
> Y
Unit: microseconds
expr min lq mean median uq max neval
for.loop 4094.869 4305.5590 5682.2124 4479.8125 5193.8190 50361.025 100
mapply.fun 542.962 577.6995 1036.9821 599.2220 658.1245 32426.296 100
rowProds 518.419 553.9120 654.2657 597.5225 637.1690 2434.267 100
rowProds.matrix 99.304 116.1065 144.9313 128.0010 153.8650 516.909 100
apply 547.493 580.1540 686.2317 628.2955 703.0565 1215.812 100
apply.matrix 117.051 136.6845 158.3808 144.9920 156.5075 339.068 100
Upvotes: 0
Reputation: 887128
One option would be to convert the upper triangle elements as NA
and then use rowProds
from matrixStats
library(matrixStats)
rowProds(as.matrix(mat[-1] * NA^upper.tri(mat[-1])), na.rm = TRUE)
#[1] 0.265508663 0.261370165 0.126427355 0.013874517 0.009758232
Upvotes: 1