Concerned_Citizen
Concerned_Citizen

Reputation: 6835

R Code Taking Too Long To Run

I have the following code running and it's taking me a long time to run. How do I know if it's still doing its job or it got stuck somewhere.

noise4<-NULL;
for(i in 1:length(noise3))
{
    if(is.na(noise3[i])==TRUE)
    {
    next;
    }
    else
    {
    noise4<-c(noise4,noise3[i]);
    }
}

noise3 is a vector with 2418233 data points.

Upvotes: 5

Views: 21776

Answers (5)

Rola
Rola

Reputation: 1936

For one case I've faced, updating all packages in use under R studio resolved the issue.

Upvotes: 0

Iterator
Iterator

Reputation: 20560

The others have all given correct ways to do the same problem, so that you needn't worry about speed. @BenBolker also gave a good pointer regarding regular output.

A different thing to note is that if you find yourself in a loop, you can break out of it and find the value of i. Assuming that re-starting from that value of i won't harm things, i.e. using that value twice won't be a problem, you can restart. Or, you can just finish the job as the others have stated.

A separate trick is that if the loop is slow (and can't be vectorized or else you're not eager to break out of the loop), AND you don't have any reporting, you can still look for an external method to see if R is actually consuming cycles on your computer. In Linux, the top command is your best bet. On Windows, the task manager will do the trick (I prefer to use the SysInternals / Microsoft program Process Explorer). 'top' also exists on Macs, though I believe there are some other more popular tools.

One other word of advice: if you have a really long loop to run, I strongly encourage saving the results regularly. I typically create a file with the a name like: myPrefix_YYYYMMDDHHMMSS.rdat . This way everything can go to hell and you can still start your loop where you left off.

I don't always iterate, but when I do, I use these tricks. Stay speedy, my friend.

Upvotes: 3

Ben Bolker
Ben Bolker

Reputation: 226087

The other answers have given you much, much better ways to do the task that you actually set out to achieve (removing NA values in your data), but an answer to the specific question you asked ("how do I know if R is actually working or if it has instead gotten stuck?") is to introduce some output (cat) statements in your loop, as follows:

rpt <- 10000  ## reporting interval
noise4<-NULL;
for(i in 1:length(noise3))
{
    if (i %% rpt == 0) cat(i,"\n")
    if(is.na(noise3[i])==TRUE)
    {
    next;
    }
    else
    {
    noise4<-c(noise4,noise3[i]);
    }
}

If you run this code you can immediately see that it slows down radically as it gets farther into the loop (a consequence of the failure to pre-allocate space) ...

Upvotes: 5

David Heffernan
David Heffernan

Reputation: 612864

You just want to remove the NA values. Do it like this:

noise4 <- noise3[!is.na(noise3)]

This will be pretty much instant.

Or as Joshua suggests, a more readable alternative:

noise4 <- na.omit(noise3)

Your code was slow because:

  1. It uses explicit loops which tend to be slow under the R interpreter.
  2. You reallocate memory every iteration.

The memory reallocation is probably the biggest handicap to your code.

Upvotes: 11

Joshua Ulrich
Joshua Ulrich

Reputation: 176648

I wanted to illustrate the benefits of pre-allocation, so I tried to run your code... but I killed it after ~5 minutes. I recommend you use noise4 <- na.omit(noise3) as I said in my comments. This code is solely for illustrative purposes.

# Create some random data
set.seed(21)
noise3 <- rnorm(2418233)
noise3[sample(2418233, 100)] <- NA

noise <- function(noise3) {
  # Pre-allocate
  noise4 <- vector("numeric", sum(!is.na(noise3)))
  for(i in seq_along(noise3)) {
    if(is.na(noise3[i])) {
      next
    } else {
      noise4[i] <- noise3[i]
    }
  }
}

system.time(noise(noise3)) # MUCH less than 5+ minutes
#    user  system elapsed 
#    9.50    0.44    9.94 

# Let's see what we gain from compiling
library(compiler)
cnoise <- cmpfun(noise)
system.time(cnoise(noise3))  # a decent reduction
#    user  system elapsed 
#    3.46    0.49    3.96 

Upvotes: 5

Related Questions