Reputation: 1519
I have a dataset that looks like this one:
test <- data.table(Weight=sample(x = c(20:100),500,replace = T),y=rnorm(500),z=rnorm(500))
> head(test)
Weight y z
1: 87 -0.7946846 -0.03136408
2: 97 1.6570765 0.61080309
3: 80 1.1592073 -0.09389739
4: 23 -0.0268602 -1.36896141
5: 32 1.3171078 -2.19978789
6: 78 -0.1961162 0.62026338
I want to duplicate each row as many times as the value under weight.I have achieved this with the following code: (I included a progressbar)
system.time(
for (i in 1:nrow(test)){
setTxtProgressBar(pb,i)
for (j in 1:test[i,]$Weight){
Testoutcome <- rbind(Testoutcome, test[i,])
}
})
user system elapsed
32.91 0.08 33.57
I found a post here that explains that rbindlist is much faster than rbind. So I modified the code like this:
system.time(
for (i in 1:nrow(test)){
setTxtProgressBar(pb,i)
for (j in 1:test[i,]$Weight){
Testoutcome <- rbindlist(list(Testoutcome, test[i,]))
}
})
user system elapsed
27.72 0.05 28.31
So it seems not to be so effective. My actual dataset is about 1.000 times larger and the query takes forever... Any ideas how to speed up? Maybe I should get the bind outside the loop?
Upvotes: 2
Views: 228