Tim_Utrecht
Tim_Utrecht

Reputation: 1519

Increase speed with rbindlist does not work with two for loops

I have a dataset that looks like this one:

test <- data.table(Weight=sample(x = c(20:100),500,replace = T),y=rnorm(500),z=rnorm(500))

> head(test)
   Weight          y           z
1:     87 -0.7946846 -0.03136408
2:     97  1.6570765  0.61080309
3:     80  1.1592073 -0.09389739
4:     23 -0.0268602 -1.36896141
5:     32  1.3171078 -2.19978789
6:     78 -0.1961162  0.62026338

I want to duplicate each row as many times as the value under weight.I have achieved this with the following code: (I included a progressbar)

system.time(
  for (i in 1:nrow(test)){
    setTxtProgressBar(pb,i)
    for (j in 1:test[i,]$Weight){
      Testoutcome <- rbind(Testoutcome, test[i,])
    }
  })
user  system elapsed 
  32.91    0.08   33.57 

I found a post here that explains that rbindlist is much faster than rbind. So I modified the code like this:

system.time(
  for (i in 1:nrow(test)){
    setTxtProgressBar(pb,i)
    for (j in 1:test[i,]$Weight){
      Testoutcome <- rbindlist(list(Testoutcome, test[i,]))
    }
  })
user  system elapsed 
  27.72    0.05   28.31

So it seems not to be so effective. My actual dataset is about 1.000 times larger and the query takes forever... Any ideas how to speed up? Maybe I should get the bind outside the loop?

Upvotes: 2

Views: 228

Answers (1)

Frank
Frank

Reputation: 66819

This should be fast, and is quite simple:

test[rep(1:.N,Weight)]

Upvotes: 4

Related Questions