Reputation: 10954
I have a large data.table
and I am trying to generate binomial random numbers (using rbinom
) using the values of one of the columns as the parameter to the distribution. Assume that index
is a unique row identifier, and that the parameter is in the responseProb
column. Then
dt[, response := rbinom(1, 1, responseProb), by = index]
rbinom
's signature is rbinom(n, size, prob)
, but since it is not vectorized over the prob
argument, it can only take a scalar as input, so I can't, but would be able to write:
dt[, response := rbinom(1, 1, responseProb)]
To give a simple example of what I mean, rbinom(1, 1, seq(0.1, 0.9, .1))
, yields
> rbinom(1, 1, seq(0.1, 0.9, .1))
[1] 1
I think that the solution to this is to use
dt[, response := rbinom(probResponse, 1, responseProb)]
but want to double check that this would lead to the same answer as the first line of code.
Upvotes: 1
Views: 3499
Reputation: 22293
So rbinom
is vectorized and you can use .N
as the first argument.
dt[, response := rbinom(.N, 1, responseProb)]
To check that this gives the same result as the indexing solution, just set a seed and repeat.
# create reproducible example
N <- 100
dt <- data.table(responseProb = runif(N),
index = 1:N)
# set seed
set.seed(1)
# your original version
dt[, response := rbinom(1, 1, responseProb), by = index]
# set seed again
set.seed(1)
# version with .N
dt[, response2 := rbinom(.N, 1, responseProb)]
# check for equality
dt[, all(response == response2)]
## [1] TRUE
Upvotes: 3