Tim
Tim

Reputation: 99408

component-wise product of a sparse vector with a component-wise function of another vector in R

Suppose I have two vectors b and a. The components of the latter (a) are almost always zero except a few.

If I want to compute component-wise product of a and a component-wise function (such as exp) of b, I can do

a*exp(b)

However for those majority zero components of a, the evaluation of exp on the corresponding components of b will be a waste.

I was wondering under cases such as this one, is it possible to program more efficiently in R? Or there is no need to change. Thanks!

Upvotes: 0

Views: 552

Answers (4)

Gavin Simpson
Gavin Simpson

Reputation: 174778

To expand on DWin's answer, and your comment to it, just keep track of the 0 and add back in the trivial answers:

## Dummy data
set.seed(1)
a <- sample(0:10, 100, replace = TRUE)
b <- runif(100)

## something to hold results
out <- numeric(length(a))
## the computations you *want* to do
want <- !a==0
## fill in the wanted answers
out[want] <- a[want] * exp(b[want])

Which gives the correct results:

> all.equal(out, a * exp(b))
[1] TRUE

If you wanted, you could wrap this into a function:

myFun <- function(a, b) {
    out <- numeric(length(a))
    want <- !a==0
    out[want] <- a[want] * exp(b[want])
    return(out)
}

Then use it

> all.equal(out, myFun(a, b))
[1] TRUE

But none of this is more efficient than using a * exp(b) directly. Both * and exp() are vectorised so will run very quickly, much more quickly than any of the booking keeping measures used in the various answers so far.

Whether you need the book-keeping solutions will depend on how expensive your function (exp() in the example in your Q) is in compute terms. Try both approaches on a small sample and evaluate the timings (using system.time()) to see if it is worth the extra effort of doing the subsetting to track the 0.

Upvotes: 2

Alex Brown
Alex Brown

Reputation: 42872

Just replace your expression with:

ifelse(a==0,0,a*exp(b))

I'd be surprised if this made a performance improvement, though, since R is interpreted, the overhead of running the ifelse is probably worse than wasting the exp invocation.

Upvotes: 2

fabians
fabians

Reputation: 3473

Similar to DWin's suggestion:

> n <- 1e5
> nonzero <- .01
> b <- rnorm(n)
> a <- rep(0, n)
> a[1:(n*nonzero)] <- rnorm(n*nonzero)
> 
> system.time(replicate(100, {
+                   c <- a*exp(b)
+               }))
   user      system     elapsed 
   1.19        0.05        1.23 
> system.time(replicate(100, {
+                   zero <- a < .Machine$double.eps
+                   c <- a
+                   c[!zero] <- a[!zero]*exp(b[!zero])
+               }))
   user      system     elapsed 
   0.42        0.08        0.50 

Upvotes: 1

IRTFM
IRTFM

Reputation: 263332

You could accomplish that by indexing both vectors with a test for whatever situation you deem a waste. If the function is more time costly than exp, it might make a difference:

a[ !b==0 ]*exp( b[!b==0] )

Also recognize that there are traps to testing for equality with numeric mode. You may want to look at zapsmall and all.equal as alternatives depending on what the real problem is.

> 3/10 == 0.1*3
[1] FALSE

Upvotes: 0

Related Questions