makbuk
makbuk

Reputation: 133

Repeat data.frame N times with adding column

I have the following data frame and I want to repeat it N times

dc <- read.table(text = "from    1    2    3    4    5
    1 0.01 0.02 0.03 0.04 0.05
    2 0.06 0.07 0.08 0.09 0.10
    3 0.11 0.12 0.13 0.14 0.15
    4 0.16 0.17 0.18 0.19 0.20
    5 0.21 0.22 0.23 0.24 0.25", header = TRUE)

n<-20    
ddr <- NA

for(i in 1:n) {
  ddr <- rbind(ddr, cbind(dc,i))
}

As a result, I would like to receive:

from   X1   X2   X3   X4   X5  i
1 0.01 0.02 0.03 0.04 0.05  1
2 0.06 0.07 0.08 0.09 0.10  1
3 0.11 0.12 0.13 0.14 0.15  1
4 0.16 0.17 0.18 0.19 0.20  1
5 0.21 0.22 0.23 0.24 0.25  1
1 0.01 0.02 0.03 0.04 0.05  2
2 0.06 0.07 0.08 0.09 0.10  2
3 0.11 0.12 0.13 0.14 0.15  2
4 0.16 0.17 0.18 0.19 0.20  2
5 0.21 0.22 0.23 0.24 0.25  2
.............................
1 0.01 0.02 0.03 0.04 0.05 20
2 0.06 0.07 0.08 0.09 0.10 20
3 0.11 0.12 0.13 0.14 0.15 20
4 0.16 0.17 0.18 0.19 0.20 20
5 0.21 0.22 0.23 0.24 0.25 20

The matrix must be repeated N times, and repeat number is added.

Is there a correct solution (easy function to do this in R) to this issue? In my case if the ddr is not declared (ddr<-NA), the script does not work. Thanks!

Upvotes: 4

Views: 7392

Answers (2)

www
www

Reputation: 4224

Here is also a more intuitive way, about identical in speed to the other top answer:

n <- 3
data.frame(df,i=rep(1:n,ea=NROW(df)))

Output (repeated 3x):

   from   X1   X2   X3   X4   X5 i
1     1 0.01 0.02 0.03 0.04 0.05 1
2     2 0.06 0.07 0.08 0.09 0.10 1
3     3 0.11 0.12 0.13 0.14 0.15 1
4     4 0.16 0.17 0.18 0.19 0.20 1
5     5 0.21 0.22 0.23 0.24 0.25 1
6     1 0.01 0.02 0.03 0.04 0.05 2
7     2 0.06 0.07 0.08 0.09 0.10 2
8     3 0.11 0.12 0.13 0.14 0.15 2
9     4 0.16 0.17 0.18 0.19 0.20 2
10    5 0.21 0.22 0.23 0.24 0.25 2
11    1 0.01 0.02 0.03 0.04 0.05 3
12    2 0.06 0.07 0.08 0.09 0.10 3
13    3 0.11 0.12 0.13 0.14 0.15 3
14    4 0.16 0.17 0.18 0.19 0.20 3
15    5 0.21 0.22 0.23 0.24 0.25 3

EDIT: Top Answer Speed Test

This test was scaled up to n=1e+05, iterations=100:

func1 <- function(){
  data.frame(df,i=rep(1:n,ea=NROW(df)))
}

func2 <- function(){
  cbind(dc, i = rep(1:n, each = nrow(dc)))
}

func3 <- function(){
  cbind(dc[rep(1:nrow(dc), n), ], i = rep(1:n, each = nrow(dc)))
}

microbenchmark::microbenchmark(
  func1(),func2(),func3())

Unit: milliseconds
  expr       min        lq      mean    median        uq      max neval cld
 func1()  15.58709  21.69143  28.62695  22.01692  23.85648 117.9012   100  a 
 func2()  15.99023  21.59375  28.37328  22.18298  23.99953 136.1209   100  a 
 func3() 414.18741 436.51732 473.14571 453.26099 498.21576 666.8515   100   b

Upvotes: 4

Rich Scriven
Rich Scriven

Reputation: 99321

You can use rep() to replicate the row indexes, and also to create the repeat number column.

cbind(dc[rep(1:nrow(dc), n), ], i = rep(1:n, each = nrow(dc)))

Let's break it down:

  • dc[rep(1:nrow(dc), n), ] uses replicated row indexes in the i value of row indexing of [ for data frames
  • rep(1:n, each = nrow(dc)) replicates a sequence the length of the n value nrow(dc) times each
  • cbind(...) combines the two into a single data frame

As @HubertL points out in the comments, this can be further simplified to

cbind(dc, i = rep(1:n, each = nrow(dc)))

thanks to the magic of recycling. Please go give him a vote.

Upvotes: 6

Related Questions