user3251201
user3251201

Reputation: 3

Using R to create a dataframe with random blocks of rows at a time

I am at a complete loss.

I have five dataframes, each with five rows in them, let's say df1, df2, ..., df5. These dataframes are fixed--there is no need to do any randomization within them.

I now want to create a dataframe with 500 rows in it, which is constructed by randomly appending each of the five dataframes 100 times, each with equal probability. That is, the five rows of each dataframe are appended, en bloc, in random orders.

So, for example, one iteration could look like this:

ROW  df
1     df1[1,]
2     df1[2,]
3     df1[3,]
4     df1[4,]
5     df1[5,]
6     df5[1,]
7     df5[2,]
8     df5[3,]
9     df5[4,]
10    df5[5,]
...
496   df2[1,]
497   df2[2,]
498   df2[3,]
499   df2[4,]
500   df2[5,]

In other languages, I could draw a random number and use some sort of case terminology, but I can't seem to find a way to do this in R.

Can anyone help? Thank you!

Upvotes: 0

Views: 346

Answers (3)

cdeterman
cdeterman

Reputation: 19950

If I understand you correctly, the following can do what you want:

df1 <- data.frame(value = rnorm(5), group = "A")
df2 <- data.frame(value = rnorm(5), group = "B")
df3 <- data.frame(value = rnorm(5), group = "C")
df4 <- data.frame(value = rnorm(5), group = "D")
df5 <- data.frame(value = rnorm(5), group = "E")

df_list <- list(df1, df2, df3, df4, df5)
require(data.table)
df <- rbindlist(rep(rbind(sample(df_list, 5)), 20))

sample randomly chooses the order of the next dataframe iteration, rbindlist is a fast rbind function from the data.table package, rep allows you to choose how many iterations (20 * 25 = 500 rows), unlist and as.data.frame get you the output you describe.

If you also want your rows within the df's permuted you can just add an additional lapply function. Although it may not be the prettiest I believe it is relatively simple if you break it into the separate elements:

df <- rbindlist(rep(rbind(sample(lapply(df_list, FUN = function(x) as.data.frame(x[sample(1:5),])), 5)), 20))

Upvotes: 0

DMT
DMT

Reputation: 1647

assuming I understand your question correctly, you could do something like this.

#we randomly sample the rows of each dataframe 100 times
rowSelection1<-sample(1:5, 100, replace=TRUE)
rowSelection2<-sample(1:5, 100, replace=TRUE)
rowSelection3<-sample(1:5, 100, replace=TRUE)
rowSelection4<-sample(1:5, 100, replace=TRUE)
rowSelection5<-sample(1:5, 100, replace=TRUE)


newDF<-rbind(df1[rowSelection1,], df2[rowSelection2,], df3[rowSelection3,], df4[rowSelection4,], df5[rowSelection5,]

I'm sure you could generalize it, but this is just a quick answer

This doesn't randomly sample from the dfs though, so you could do something like this

 createNewRows<-function(dfid){
     switch(dfid, 
            "1"=df1,
            "2"=df2,
            "3"=df3,
            "4"=df4,
            "5"=df5,)
 }

 rowList<-lapply(sample(1:5, 100, replace=TRUE), createNewRows)

 rbindlist(rowList)

Upvotes: 0

akrun
akrun

Reputation: 886938

Not sure if I understand correctly. You could try:

library(data.table)
df_list <- mget(ls(pattern="df\\d+"))  #using the data from @charles though without a set.seed()
res <- rbindlist(df_list[sample(seq_along(df_list),100, replace=TRUE)])

  res[1:10,]
 #        value
 #1: -0.81396114
 #2:  1.34798534
 #3:  0.08308022
 #4: -0.18476069
 #5:  0.58039641
 #6: -1.18188902
 #7: -0.74525519
 #8:  0.17258696
 #9: -1.20630019
#10:  1.42088692

  df_list[4:5]
 #$df4
 #      value
 #1 -1.1818890
 #2 -0.7452552
 #3  0.1725870
 #4 -1.2063002
 #5  1.4208869

 #$df5
 #       value
 #1 -0.81396114
 #2  1.34798534
 #3  0.08308022
 #4 -0.18476069
 #5  0.58039641

Upvotes: 1

Related Questions