Reputation: 421
I'm trying to create 10 or more of pseudo dataframes. The data frame dim should be 9 columns with 5 rows(Mon, Tue,Wed, Thur, Fri), and each rowsum should be 9. like below.
Factor1 Factor2 Factor3 Factor4 Factor5 Factor6 Factor7 Factor8 Factor9
Mon 2 1 0 2 0 0 1 1 2
Tue 1 1 1 1 0 0 2 1 2
Wed 2 1 0 2 1 1 1 1 0
Thu 0 0 1 1 3 0 2 2 0
Fri 1 0 0 1 1 0 2 2 2
How can I generate multiple dataframes that meet the condition, please?
Upvotes: 3
Views: 77
Reputation: 101373
I think you can try rmultinom
like below
> set.seed(0)
> (d <- as.data.frame(t(rmultinom(5, 9, rep(1, 9)))))
V1 V2 V3 V4 V5 V6 V7 V8 V9
1 2 0 1 1 2 0 2 1 0
2 1 1 0 0 0 2 1 3 1
3 1 1 4 0 1 1 0 1 0
4 0 0 1 0 1 3 1 1 2
5 1 1 0 2 1 2 0 1 1
# verify the resulting dataframe
> rowSums(d)
[1] 9 9 9 9 9
If you want to wrap the code into a function for easier use, you can try
f <- function(nrFcts, nrRows = 5) {
setNames(
as.data.frame(t(rmultinom(nrRows, nrFcts, rep(1, nrFcts)))),
paste0("Factor", seq_len(nrFcts))
)
}
such that
> f(9)
Factor1 Factor2 Factor3 Factor4 Factor5 Factor6 Factor7 Factor8 Factor9
1 1 2 1 1 1 1 1 0 1
2 1 1 1 1 2 1 0 0 2
3 0 1 1 1 1 3 0 1 1
4 0 1 0 1 2 0 3 1 1
5 2 0 0 1 2 2 0 2 0
and
> f(4, 10)
Factor1 Factor2 Factor3 Factor4
1 1 2 1 0
2 2 1 1 0
3 2 0 1 1
4 1 1 1 1
5 2 0 0 2
6 0 0 2 2
7 1 1 1 1
8 2 0 1 1
9 1 1 1 1
10 1 2 0 1
Upvotes: 1
Reputation: 37641
Here is a function that will generate random matrices to your specifications.
GenDF = function() {
M = matrix(0, nrow=5, ncol=9)
for(i in 1:5) {
S = sample(9,9,replace=T)
for(j in S) { M[i,j] = M[i,j] + 1 }
}
rownames(M) = c('Mon', 'Tue', 'Wed', 'Thu','Fri')
colnames(M) = paste('Factor', 1:9, sep='')
as.data.frame(M)
}
GenDF()
Factor1 Factor2 Factor3 Factor4 Factor5 Factor6 Factor7 Factor8 Factor9
Mon 3 3 1 1 0 0 0 0 1
Tue 3 1 0 1 0 2 0 2 0
Wed 1 0 1 1 0 1 2 1 2
Thu 1 2 0 1 1 1 3 0 0
Fri 0 1 1 2 2 0 0 3 0
To elaborate on why the rows sum to one: The line
S = sample(9,9,replace=T)
will choose nine numbers between one and nine with replacement. The idea is that each one of the selected numbers represents one of the nine items to be distributed across the nine columns. The number selected tells you which column it will go into. Since the selection is being made with replacement, sometimes a column gets more than one of the nine items.
Upvotes: 3
Reputation: 126
Using data.table
:
library(data.table)
dt <- fread("Factor1 Factor2 Factor3 Factor4 Factor5 Factor6 Factor7 Factor8 Factor9
2 1 0 2 0 0 1 1 2
1 1 1 1 0 0 2 1 2
2 1 0 2 1 1 1 1 0
0 0 1 1 3 0 2 2 0
1 0 0 1 1 0 2 2 2")
set.seed(123)
dt_list <- vector("list", 10)
for (i in 1:10) {
dt_tmp <- dt[, sample(.SD), by = .(seq_len(nrow(dt)))][, -1]
setnames(dt_tmp, names(dt))
dt_list[[i]] <- dt_tmp
}
dt_list
[[1]]
Factor1 Factor2 Factor3 Factor4 Factor5 Factor6 Factor7 Factor8 Factor9
1: 0 0 2 1 1 0 1 2 2
2: 0 1 1 1 0 2 1 1 2
3: 0 2 1 0 1 1 1 1 2
4: 1 1 0 0 0 3 2 0 2
5: 2 1 2 0 0 1 2 1 0
[[2]]
Factor1 Factor2 Factor3 Factor4 Factor5 Factor6 Factor7 Factor8 Factor9
1: 0 2 1 0 2 2 0 1 1
2: 1 2 1 1 0 2 0 1 1
3: 2 1 1 0 2 1 0 1 1
4: 1 3 2 0 1 0 0 0 2
5: 2 2 0 0 1 2 1 0 1
[[3]]
Factor1 Factor2 Factor3 Factor4 Factor5 Factor6 Factor7 Factor8 Factor9
1: 0 0 0 2 1 1 2 2 1
2: 1 1 1 2 1 0 0 2 1
3: 2 1 2 1 0 0 1 1 1
4: 2 0 2 1 3 0 1 0 0
5: 2 0 0 1 0 2 1 2 1
[[4]]
Factor1 Factor2 Factor3 Factor4 Factor5 Factor6 Factor7 Factor8 Factor9
1: 0 2 1 0 1 1 2 0 2
2: 1 1 1 2 1 0 1 2 0
3: 0 1 1 0 2 1 1 2 1
4: 1 0 0 0 0 1 2 2 3
5: 2 0 1 2 0 0 1 2 1
[[5]]
Factor1 Factor2 Factor3 Factor4 Factor5 Factor6 Factor7 Factor8 Factor9
1: 2 1 2 1 1 2 0 0 0
2: 2 0 1 1 2 1 0 1 1
3: 0 1 1 1 1 2 1 0 2
4: 0 2 0 1 0 3 1 0 2
5: 1 0 2 0 2 1 0 1 2
[[6]]
Factor1 Factor2 Factor3 Factor4 Factor5 Factor6 Factor7 Factor8 Factor9
1: 1 1 2 0 2 0 2 0 1
2: 0 1 1 1 2 2 0 1 1
3: 1 1 2 0 1 2 1 0 1
4: 0 2 3 0 1 1 0 0 2
5: 0 1 2 1 1 0 2 2 0
[[7]]
Factor1 Factor2 Factor3 Factor4 Factor5 Factor6 Factor7 Factor8 Factor9
1: 2 0 1 1 0 0 1 2 2
2: 2 1 1 0 0 1 1 2 1
3: 1 1 1 2 1 2 1 0 0
4: 0 0 3 0 1 2 1 0 2
5: 2 1 0 2 2 0 1 0 1
[[8]]
Factor1 Factor2 Factor3 Factor4 Factor5 Factor6 Factor7 Factor8 Factor9
1: 0 2 2 1 0 2 0 1 1
2: 1 2 1 1 1 0 0 2 1
3: 0 2 1 1 1 1 2 1 0
4: 2 3 2 1 0 0 0 0 1
5: 0 0 1 0 2 1 2 2 1
[[9]]
Factor1 Factor2 Factor3 Factor4 Factor5 Factor6 Factor7 Factor8 Factor9
1: 2 0 1 1 1 2 0 0 2
2: 1 0 1 1 2 1 1 2 0
3: 1 0 2 2 1 1 0 1 1
4: 1 0 2 0 3 1 2 0 0
5: 1 1 1 2 0 2 0 2 0
[[10]]
Factor1 Factor2 Factor3 Factor4 Factor5 Factor6 Factor7 Factor8 Factor9
1: 0 1 1 0 2 0 1 2 2
2: 1 1 1 1 0 2 1 2 0
3: 1 1 2 2 0 0 1 1 1
4: 2 0 3 2 0 0 1 1 0
5: 2 0 1 0 2 2 1 0 1
# To validate they match the condition
lapply(dt_list, rowSums)
[[1]]
[1] 9 9 9 9 9
[[2]]
[1] 9 9 9 9 9
[[3]]
[1] 9 9 9 9 9
[[4]]
[1] 9 9 9 9 9
[[5]]
[1] 9 9 9 9 9
[[6]]
[1] 9 9 9 9 9
[[7]]
[1] 9 9 9 9 9
[[8]]
[1] 9 9 9 9 9
[[9]]
[1] 9 9 9 9 9
[[10]]
[1] 9 9 9 9 9
# To validate they are differents
lapply(dt_list, colSums)
[[1]]
Factor1 Factor2 Factor3 Factor4 Factor5 Factor6 Factor7 Factor8 Factor9
3 5 6 2 2 7 7 5 8
[[2]]
Factor1 Factor2 Factor3 Factor4 Factor5 Factor6 Factor7 Factor8 Factor9
6 10 5 1 6 7 1 3 6
[[3]]
Factor1 Factor2 Factor3 Factor4 Factor5 Factor6 Factor7 Factor8 Factor9
7 2 5 7 5 3 5 7 4
[[4]]
Factor1 Factor2 Factor3 Factor4 Factor5 Factor6 Factor7 Factor8 Factor9
4 4 4 4 4 3 7 8 7
[[5]]
Factor1 Factor2 Factor3 Factor4 Factor5 Factor6 Factor7 Factor8 Factor9
5 4 6 4 6 9 2 2 7
[[6]]
Factor1 Factor2 Factor3 Factor4 Factor5 Factor6 Factor7 Factor8 Factor9
2 6 10 2 7 5 5 3 5
[[7]]
Factor1 Factor2 Factor3 Factor4 Factor5 Factor6 Factor7 Factor8 Factor9
7 3 6 5 4 5 5 4 6
[[8]]
Factor1 Factor2 Factor3 Factor4 Factor5 Factor6 Factor7 Factor8 Factor9
3 9 7 4 4 4 4 6 4
[[9]]
Factor1 Factor2 Factor3 Factor4 Factor5 Factor6 Factor7 Factor8 Factor9
6 1 7 6 7 7 3 5 3
[[10]]
Factor1 Factor2 Factor3 Factor4 Factor5 Factor6 Factor7 Factor8 Factor9
6 3 8 5 4 4 5 6 4
Upvotes: 3