Increase speed with rbindlist does not work with two for loops

Question

I have a dataset that looks like this one:

test <- data.table(Weight=sample(x = c(20:100),500,replace = T),y=rnorm(500),z=rnorm(500))

> head(test)
   Weight          y           z
1:     87 -0.7946846 -0.03136408
2:     97  1.6570765  0.61080309
3:     80  1.1592073 -0.09389739
4:     23 -0.0268602 -1.36896141
5:     32  1.3171078 -2.19978789
6:     78 -0.1961162  0.62026338

I want to duplicate each row as many times as the value under weight.I have achieved this with the following code: (I included a progressbar)

system.time(
  for (i in 1:nrow(test)){
    setTxtProgressBar(pb,i)
    for (j in 1:test[i,]$Weight){
      Testoutcome <- rbind(Testoutcome, test[i,])
    }
  })
user  system elapsed 
  32.91    0.08   33.57

I found a post here that explains that rbindlist is much faster than rbind. So I modified the code like this:

system.time(
  for (i in 1:nrow(test)){
    setTxtProgressBar(pb,i)
    for (j in 1:test[i,]$Weight){
      Testoutcome <- rbindlist(list(Testoutcome, test[i,]))
    }
  })
user  system elapsed 
  27.72    0.05   28.31

So it seems not to be so effective. My actual dataset is about 1.000 times larger and the query takes forever... Any ideas how to speed up? Maybe I should get the bind outside the loop?

Frank · Accepted Answer

This should be fast, and is quite simple:

test[rep(1:.N,Weight)]

Increase speed with rbindlist does not work with two for loops

Answers (1)

Related Questions