LGTrader
LGTrader

Reputation: 2429

Rewrite loop using apply

Newbie question: This double loop on a data.frame of about 50K elements evaluates very slowly, taking over 30 seconds. I've read online that I should be using some form of the apply function to fix this but so far cannot get the code right. Starting with a first data.frame that has gain results in it, the goal is to get a second data.frame where only the values greater than the target are filled in and all others have 0.

This code works:

ExcessGain = function(Value, Target){
  max(0,Value - Target)
}

Pcnt_O_O_x = data.frame()

for (j in 1:ncol(Pcnt_O_O)){
  for (i in 1:nrow(Pcnt_O_O)){
    Pcnt_O_O_x[i,j] = ExcessGain(Pcnt_O_O[i,j], GainTargetPcnt)
  }
}

Can I speed this up somehow using an apply function instead of the inside loop?

Upvotes: 3

Views: 336

Answers (1)

Simon O'Hanlon
Simon O'Hanlon

Reputation: 59970

Your function looks like it is just subtracting a target value from the value of each cell in your array. Any negative values are replace by 0. In that case you don't need any loops, you can just use R 's built in vectorisation to do this:

set.seed(123)
# If you have a data.frame of all numeric elements turn it into a matrix first
df <- as.matrix( data.frame( matrix( runif(25) , nrow = 5 ) ) )

target <- 0.5
df
#        X1        X2        X3         X4        X5
#1 0.2875775 0.0455565 0.9568333 0.89982497 0.8895393
#2 0.7883051 0.5281055 0.4533342 0.24608773 0.6928034
#3 0.4089769 0.8924190 0.6775706 0.04205953 0.6405068
#4 0.8830174 0.5514350 0.5726334 0.32792072 0.9942698
#5 0.9404673 0.4566147 0.1029247 0.95450365 0.6557058

df2 <- df - target
df2
#          X1          X2          X3         X4        X5
#1 -0.21242248 -0.45444350  0.45683335  0.3998250 0.3895393
#2  0.28830514  0.02810549 -0.04666584 -0.2539123 0.1928034
#3 -0.09102308  0.39241904  0.17757064 -0.4579405 0.1405068
#4  0.38301740  0.05143501  0.07263340 -0.1720793 0.4942698
#5  0.44046728 -0.04338526 -0.39707532  0.4545036 0.1557058

df2[ df2 < 0 ] <- 0
df2
#        X1         X2        X3        X4        X5
#1 0.0000000 0.00000000 0.4568333 0.3998250 0.3895393
#2 0.2883051 0.02810549 0.0000000 0.0000000 0.1928034
#3 0.0000000 0.39241904 0.1775706 0.0000000 0.1405068
#4 0.3830174 0.05143501 0.0726334 0.0000000 0.4942698
#5 0.4404673 0.00000000 0.0000000 0.4545036 0.1557058

Here is some benchmarking to show the difference in speed in operating on a matrix as opposed to operating on a data.frame. f.df( df ) and f.m( m ) are two functions operating on a data.frame and matrix with 1 million elements resepctively:

require( microbenchmark )
microbenchmark( f.df( df ) , f.m( m ) , times = 10L )

#Unit: milliseconds
#     expr        min         lq     median         uq        max neval
# f.df(df) 6944.09808 9009.39684 9233.18528 9533.75089 10036.5963    10
#   f.m(m)   37.26433   39.00189   40.46229   41.15626   130.6983    10

Operating on a matrix is two orders of magnitude quicker when the matrix is big.

If you really need to use an apply function, you can aplpy to every cell of an matrix like so:

m <- matrix( runif(25) , nrow = 5 )
target <- 0.5
apply( m , 1:2 , function(x) max(x - target , 0 ) )
#         [,1]      [,2]       [,3]      [,4]      [,5]
#[1,] 0.4575807 0.0000000 0.15935928 0.0000000 0.1948637
#[2,] 0.0000000 0.0000000 0.00000000 0.0000000 0.0000000
#[3,] 0.0000000 0.0000000 0.00000000 0.0000000 0.0000000
#[4,] 0.3912719 0.0000000 0.06155316 0.1533290 0.0000000
#[5,] 0.3228921 0.4697041 0.23554353 0.1352888 0.0000000

Upvotes: 3

Related Questions