Reputation: 491
I want to generate 4 new columns from an existing variable total
by random sampling. the results for each row should meet the condition s1 + s2 + s3 + s4 == total
. Fro example,
> tabulate(sample.int(4, 100, replace = TRUE))
[1] 22 21 27 30
The following code does not work since the function appears to recycle the first row and applies it column-wise.
DT <- data.table(total = c(100, 110, 90, 92))
DT[, c(paste0("s", 1:4)) := tabulate(sample.int(4, total, replace = TRUE))]
> DT
total s1 s2 s3 s4
1: 100 31 31 31 31
2: 110 25 25 25 25
3: 90 22 22 22 22
4: 92 22 22 22 22
How to get around this? I am clearly missing some basic understanding on how R
vector/list work. Your help will be much appreciated.
Upvotes: 1
Views: 152
Reputation: 102625
Maybe you can try the code below
DTout <- cbind(
DT,
do.call(
rbind,
lapply(DT$total, function(x) diff(sort(c(0, sample(x - 1, 3), x))))
)
)
which gives
total V1 V2 V3 V4
1: 100 51 5 17 27
2: 110 41 1 40 28
3: 90 32 34 14 10
4: 102 5 73 13 11
5: 92 17 13 17 45
Test
> rowSums(DTout[,-1])
[1] 100 110 90 102 92
Upvotes: 0
Reputation: 151
Edited following edited question:
data.table
will expect a list internally when you want to assign to many columns. To get it so each row is unique, then you can do that by adding a by
each row:
DT <- data.table(total = c(100, 110, 90, 102, 92))
DT[, c(paste0("s", 1:4)) := {
as.list(tabulate(sample.int(4, total, replace = TRUE)))
}, by = seq(NROW(DT))]
Which outputs the following, satisfying the OP criteria:
> DT
total s1 s2 s3 s4
1: 100 27 28 28 17
2: 110 25 23 36 26
3: 90 26 19 26 19
4: 102 28 24 21 29
5: 92 17 27 22 26
> apply(DT[, 2:5],1, sum)
[1] 100 110 90 102 92
Upvotes: 1