Looping over rows in a dataframe

Question

Suppose I need to loop over the rows in a data frame for some reason.

I create a simple data.frame

df <- data.frame(id = sample(1e6, 1e7, replace = TRUE))

It seems that f2 is much slower than f1, while I expected them to be equivalent.

f1 <- function(v){
        for (obs in 1:(1e6) ){
            a <- v[obs] 
        }
        a
    }
system.time(f1(df$id))

f2 <- function(){
        for (obs in 1:(1e6) ){
            a <- df$id[obs] 
        }
    a
    }
system.time(f2())

Would you know why? Do they use exactly the same amount of memory?

Josh O&#39;Brien · Accepted Answer

If you instead write your timings like this and recognize that df$x is really a function call (to `$`(df,x)) the mystery disappears:

system.time(for(i in 1:1e6) df$x)
#    user  system elapsed 
#    8.52    0.00    8.53 
system.time(for(i in 1) df$x)
#    user  system elapsed 
#       0       0       0

Looping over rows in a dataframe

Answers (2)

Related Questions