Reputation: 3
I am at a complete loss.
I have five dataframes, each with five rows in them, let's say df1, df2, ..., df5
. These dataframes are fixed--there is no need to do any randomization within them.
I now want to create a dataframe with 500 rows in it, which is constructed by randomly appending each of the five dataframes 100 times, each with equal probability. That is, the five rows of each dataframe are appended, en bloc, in random orders.
So, for example, one iteration could look like this:
ROW df
1 df1[1,]
2 df1[2,]
3 df1[3,]
4 df1[4,]
5 df1[5,]
6 df5[1,]
7 df5[2,]
8 df5[3,]
9 df5[4,]
10 df5[5,]
...
496 df2[1,]
497 df2[2,]
498 df2[3,]
499 df2[4,]
500 df2[5,]
In other languages, I could draw a random number and use some sort of case
terminology, but I can't seem to find a way to do this in R.
Can anyone help? Thank you!
Upvotes: 0
Views: 346
Reputation: 19950
If I understand you correctly, the following can do what you want:
df1 <- data.frame(value = rnorm(5), group = "A")
df2 <- data.frame(value = rnorm(5), group = "B")
df3 <- data.frame(value = rnorm(5), group = "C")
df4 <- data.frame(value = rnorm(5), group = "D")
df5 <- data.frame(value = rnorm(5), group = "E")
df_list <- list(df1, df2, df3, df4, df5)
require(data.table)
df <- rbindlist(rep(rbind(sample(df_list, 5)), 20))
sample
randomly chooses the order of the next dataframe iteration, rbindlist
is a fast rbind function from the data.table package, rep
allows you to choose how many iterations (20 * 25 = 500 rows), unlist
and as.data.frame
get you the output you describe.
If you also want your rows within the df's permuted you can just add an additional lapply
function. Although it may not be the prettiest I believe it is relatively simple if you break it into the separate elements:
df <- rbindlist(rep(rbind(sample(lapply(df_list, FUN = function(x) as.data.frame(x[sample(1:5),])), 5)), 20))
Upvotes: 0
Reputation: 1647
assuming I understand your question correctly, you could do something like this.
#we randomly sample the rows of each dataframe 100 times
rowSelection1<-sample(1:5, 100, replace=TRUE)
rowSelection2<-sample(1:5, 100, replace=TRUE)
rowSelection3<-sample(1:5, 100, replace=TRUE)
rowSelection4<-sample(1:5, 100, replace=TRUE)
rowSelection5<-sample(1:5, 100, replace=TRUE)
newDF<-rbind(df1[rowSelection1,], df2[rowSelection2,], df3[rowSelection3,], df4[rowSelection4,], df5[rowSelection5,]
I'm sure you could generalize it, but this is just a quick answer
This doesn't randomly sample from the dfs though, so you could do something like this
createNewRows<-function(dfid){
switch(dfid,
"1"=df1,
"2"=df2,
"3"=df3,
"4"=df4,
"5"=df5,)
}
rowList<-lapply(sample(1:5, 100, replace=TRUE), createNewRows)
rbindlist(rowList)
Upvotes: 0
Reputation: 886938
Not sure if I understand correctly. You could try:
library(data.table)
df_list <- mget(ls(pattern="df\\d+")) #using the data from @charles though without a set.seed()
res <- rbindlist(df_list[sample(seq_along(df_list),100, replace=TRUE)])
res[1:10,]
# value
#1: -0.81396114
#2: 1.34798534
#3: 0.08308022
#4: -0.18476069
#5: 0.58039641
#6: -1.18188902
#7: -0.74525519
#8: 0.17258696
#9: -1.20630019
#10: 1.42088692
df_list[4:5]
#$df4
# value
#1 -1.1818890
#2 -0.7452552
#3 0.1725870
#4 -1.2063002
#5 1.4208869
#$df5
# value
#1 -0.81396114
#2 1.34798534
#3 0.08308022
#4 -0.18476069
#5 0.58039641
Upvotes: 1