user6291
user6291

Reputation: 541

Faster if statement evaluations in R

So I am trying to figure out if there is a better way to add multiple conditional statements to an if clause in R in order to speed up the process. Below is some code that I wrote that runs very fast on large datasets in the simple case and not so fast on the not so simple case. Any suggestions are greatly appreciated! Also, the tic-toc function is at the very bottom on the question in case you would like to run it yourself and see how fast the function runs.

Also, to give some intuition of what the code is doing, the first chunk is simply determining if there are any pairs of x's and y's that have larger values than all of the other x's and y's.

The second chunk of code is doing the same thing, however, it adds the condition that is any of the x values are actually equal to each other then check to see which one has the lowest y value. Likewise, if any of the y values are equal to each other than check to see which one has the lowest x value.

So, running the code in the simple case I have the following:

tic()

x = runif(10000)
y = runif(10000)

front = 1:length(x)

for(i in 1:length(x)){
    for(n in 1:length(x)){
        if((x[i]>x[n]  &  y[i]>y[n])){
            front[i] = NA
            break
        }
    }
}

toc()

So as you can see, I am only evaluating the single condition that x[i]>x[n] & y[i]>y[n]

toc() elapsed 1.28

and the code above runs in 1.28 seconds. Now, running the code when I have three conditions to check I have the following:

tic()

x = runif(10000)
y = runif(10000)

front = 1:length(x)

for(i in 1:length(x)){
    for(n in 1:length(x)){
        if((x[i]>x[n]  &  y[i]>y[n]) | (x[i]==x[n] & y[i]!=min(y[which(x==x[i])])) | (y[i]==y[n] & x[i]!=min(x[which(y==y[i])]))){
            front[i] = NA
            break
        }
    }
}


toc()

so as you can see, I now have to check three conditions inside my if statement, namely,

(x[i]>x[n]  &  y[i]>y[n]) | (x[i]==x[n] & y[i]!=min(y[which(x==x[i])])) | (y[i]==y[n] & x[i]!=min(x[which(y==y[i])]))

however, this leads to a huge computational burden in R and make the code much more slow.

> toc()
elapsed 
  74.47

We see that running the newly adapted code its now slowed down considerably to 74.47 seconds. Now I am looking for either alternative function calls that would speed up my code or simply rewriting it in a "better" way that the code is not so slow.

Here is the code for the tic-toc function if needed:

tic <- function(gcFirst = TRUE, type=c("elapsed", "user.self", "sys.self"))
{
   type <- match.arg(type)
   assign(".type", type, envir=baseenv())
   if(gcFirst) gc(FALSE)
   tic <- proc.time()[type]         
   assign(".tic", tic, envir=baseenv())
   invisible(tic)
}

toc <- function()
{
   type <- get(".type", envir=baseenv())
   toc <- proc.time()[type]
   tic <- get(".tic", envir=baseenv())
   print(toc - tic)
   invisible(toc)
}

EDIT for sashkello

So my code now looks like this:

library(mvtnorm)
#Here are the variables I will be working with 

> x
 [1] 0.53137100 0.75357474 0.87904120 0.29727488 0.00000000 0.00000000
 [7] 0.00000000 0.00000000 0.00000000 0.04059217
> y
 [1]  4.873500  3.896917  1.258215  5.776484 12.475491  5.273784 13.803158
 [8]  4.472204  2.629839  6.689242
> front
 [1] NA NA  3 NA NA NA NA NA  9 NA
> all.preds
[1] 0.596905183 0.027696850 1.005666896 0.007688514 3.900000000

    x = x[!is.na(front)]
    y = y[!is.na(front)]

    mu = c(all.preds[1],all.preds[3])
    sigma = matrix(c(all.preds[2],0,0,all.preds[4]),nrow=2)

    z = rmvnorm(10000,mu,sigma)
    z[,1] = sapply(z[,1],function(x){max(x,0)})

    points(z,col="black",pch=19,cex=.01)
    temp = 1:nrow(z)

    for(i in 1:length(temp)){
        cond1 = z[i,2]!=min(z[which(z[,1]==z[i,1]),2])
        cond2 = z[i,1]!=min(z[which(z[,2]==z[i,2]),1])
        for(n in 1:length(x)){
            if((z[i,1]>x[n]  &  z[i,2]>y[n]) | (z[i,1]==x[n] & cond1) | (z[i,2]==y[n] & cond2)){
                temp[i] = NA
                break
            }
        }
    }
    prop = sum(!is.na(temp))/length(temp)

and that cond1 and cond2 statement still take horribly long. Any suggestions?

Upvotes: 0

Views: 2514

Answers (2)

Roland
Roland

Reputation: 132706

Since you asked for it, here is an efficient way to calculate cond1 outside of a for loop (which you probably don't need at all):

#some data_
set.seed(42)
z <- matrix(sample(1:5, 200, TRUE), ncol=2)

#your loop
cond1 <- logical(100)

for (i in 1:100) {
cond1[i] = z[i,2]!=min(z[which(z[,1]==z[i,1]),2])
}

#alternative
library(data.table)
DT <- data.table(z)
DT[, id:=.I]

DT[, cond1:=V2!=min(V2), by=V1]

#compare results
identical(DT[, cond1], cond1)
#[1] TRUE

Upvotes: 2

sashkello
sashkello

Reputation: 17871

You can put y[i]!=min(y[which(x==x[i])]) and x[i]!=min(x[which(y==y[i])]) before the second loop, because they both only involve i.

for(i in 1:length(x)){
    cond1 = y[i]!=min(y[which(x==x[i])])
    cond2 = x[i]!=min(x[which(y==y[i])])
    for(n in 1:length(x)){
        if((x[i]>x[n]  &  y[i]>y[n]) | (x[i]==x[n] & cond1) | (y[i]==y[n] & cond2)){

This should speed things up significantly because both min and which are extremely slow and you are running them every time in the second loop.

Upvotes: 3

Related Questions