Reputation: 6835
I have the following code running and it's taking me a long time to run. How do I know if it's still doing its job or it got stuck somewhere.
noise4<-NULL;
for(i in 1:length(noise3))
{
if(is.na(noise3[i])==TRUE)
{
next;
}
else
{
noise4<-c(noise4,noise3[i]);
}
}
noise3 is a vector with 2418233 data points.
Upvotes: 5
Views: 21776
Reputation: 1936
For one case I've faced, updating all packages in use under R studio resolved the issue.
Upvotes: 0
Reputation: 20560
The others have all given correct ways to do the same problem, so that you needn't worry about speed. @BenBolker also gave a good pointer regarding regular output.
A different thing to note is that if you find yourself in a loop, you can break out of it and find the value of i
. Assuming that re-starting from that value of i
won't harm things, i.e. using that value twice won't be a problem, you can restart. Or, you can just finish the job as the others have stated.
A separate trick is that if the loop is slow (and can't be vectorized or else you're not eager to break out of the loop), AND you don't have any reporting, you can still look for an external method to see if R is actually consuming cycles on your computer. In Linux, the top
command is your best bet. On Windows, the task manager will do the trick (I prefer to use the SysInternals / Microsoft program Process Explorer). 'top' also exists on Macs, though I believe there are some other more popular tools.
One other word of advice: if you have a really long loop to run, I strongly encourage saving the results regularly. I typically create a file with the a name like: myPrefix_YYYYMMDDHHMMSS.rdat
. This way everything can go to hell and you can still start your loop where you left off.
I don't always iterate, but when I do, I use these tricks. Stay speedy, my friend.
Upvotes: 3
Reputation: 226087
The other answers have given you much, much better ways to do the task that you actually set out to achieve (removing NA
values in your data), but an answer to the specific question you asked ("how do I know if R is actually working or if it has instead gotten stuck?") is to introduce some output (cat
) statements in your loop, as follows:
rpt <- 10000 ## reporting interval
noise4<-NULL;
for(i in 1:length(noise3))
{
if (i %% rpt == 0) cat(i,"\n")
if(is.na(noise3[i])==TRUE)
{
next;
}
else
{
noise4<-c(noise4,noise3[i]);
}
}
If you run this code you can immediately see that it slows down radically as it gets farther into the loop (a consequence of the failure to pre-allocate space) ...
Upvotes: 5
Reputation: 612864
You just want to remove the NA values. Do it like this:
noise4 <- noise3[!is.na(noise3)]
This will be pretty much instant.
Or as Joshua suggests, a more readable alternative:
noise4 <- na.omit(noise3)
Your code was slow because:
The memory reallocation is probably the biggest handicap to your code.
Upvotes: 11
Reputation: 176648
I wanted to illustrate the benefits of pre-allocation, so I tried to run your code... but I killed it after ~5 minutes. I recommend you use noise4 <- na.omit(noise3)
as I said in my comments. This code is solely for illustrative purposes.
# Create some random data
set.seed(21)
noise3 <- rnorm(2418233)
noise3[sample(2418233, 100)] <- NA
noise <- function(noise3) {
# Pre-allocate
noise4 <- vector("numeric", sum(!is.na(noise3)))
for(i in seq_along(noise3)) {
if(is.na(noise3[i])) {
next
} else {
noise4[i] <- noise3[i]
}
}
}
system.time(noise(noise3)) # MUCH less than 5+ minutes
# user system elapsed
# 9.50 0.44 9.94
# Let's see what we gain from compiling
library(compiler)
cnoise <- cmpfun(noise)
system.time(cnoise(noise3)) # a decent reduction
# user system elapsed
# 3.46 0.49 3.96
Upvotes: 5