rwteg
rwteg

Reputation: 33

Product of columns, by row, with dynamic column input - Vectorize operation

I would like to vectorize the following code in for more efficient processing. I need to take the product of columns by row (i.e. rowProds), but the number of columns I would like the product of needs to be a function of another input.

If possible I would prefer this be done using Base R, but I'm open to and appreciate any suggestions.

This can easily be done using a loop or apply family with a udf, but these are not fast enough to meet my needs.

# Generate some data

mat <- data.frame(X = 1:5)
for (i in 1:5) {
  set.seed(i)
  mat[1 + i] <- runif(5)
}

# Via a for loop

for (i in 1:nrow(mat)) {  
  mat$calc[i] <- prod(mat[match(mat$X[i], mat$X), 2:(i + 1)])
}
mat

# Via a function with mapply

rowprodfun <- function(X) {  
  myprod <- prod(mat[match(X, mat$X), 2:(X + 1)])
  return(myprod)
}

mat$calc <- mapply(rowprodfun, mat$X)
mat

mat$calc
# [1] 0.265508663 0.261370165 0.126427355 0.013874517 0.009758232

Both methods above result in the same "calc" column. I just need a more efficient way to generate this column.

Upvotes: 3

Views: 117

Answers (2)

rwteg
rwteg

Reputation: 33

The use of upper.tri suggested by @akrun was super helpful. The final piece was to convert the data.frame to a matrix as.matrix before doing the element-wise multiplication.

rowProds(as.matrix(mat[-1]) * NA ^ upper.tri(mat[-1]), na.rm = T) resulted in the most efficient calculation.

apply(as.matrix(mat[-1]) * NA ^ upper.tri(mat[-1]), 1, prod, na.rm = T) was nearly as efficient if trying to accomplish in base R.

library(microbenchmark)
library(matrixStats)
library(ggplot2)

Y <- microbenchmark(
  for.loop = for (i in 1:nrow(mat)) {prod(mat[match(mat$X[i], mat$X), 2:(i + 1)])},
  mapply.fun = mapply(rowprodfun, mat$X),
  rowProds = rowProds(as.matrix(mat[-1] * NA ^ upper.tri(mat[-1])), na.rm = T),
  rowProds.matrix = rowProds(as.matrix(mat[-1]) * NA ^ upper.tri(mat[-1]), na.rm = T),
  apply = apply(mat[-1] * NA ^ upper.tri(mat[-1]), 1, prod, na.rm = T),
  apply.matrix = apply(as.matrix(mat[-1]) * NA ^ upper.tri(mat[-1]), 1, prod, na.rm = T)
)

> Y
Unit: microseconds
            expr      min        lq      mean    median        uq       max neval
        for.loop 4094.869 4305.5590 5682.2124 4479.8125 5193.8190 50361.025   100
      mapply.fun  542.962  577.6995 1036.9821  599.2220  658.1245 32426.296   100
        rowProds  518.419  553.9120  654.2657  597.5225  637.1690  2434.267   100
 rowProds.matrix   99.304  116.1065  144.9313  128.0010  153.8650   516.909   100
           apply  547.493  580.1540  686.2317  628.2955  703.0565  1215.812   100
    apply.matrix  117.051  136.6845  158.3808  144.9920  156.5075   339.068   100

benchmarkautoplot

Upvotes: 0

akrun
akrun

Reputation: 887128

One option would be to convert the upper triangle elements as NA and then use rowProds from matrixStats

library(matrixStats)
rowProds(as.matrix(mat[-1] * NA^upper.tri(mat[-1])), na.rm = TRUE)
#[1] 0.265508663 0.261370165 0.126427355 0.013874517 0.009758232

Upvotes: 1

Related Questions