tchakravarty
tchakravarty

Reputation: 10954

R data.table: Generate random numbers

I have a large data.table and I am trying to generate binomial random numbers (using rbinom) using the values of one of the columns as the parameter to the distribution. Assume that index is a unique row identifier, and that the parameter is in the responseProb column. Then

dt[, response := rbinom(1, 1, responseProb), by = index]

rbinom's signature is rbinom(n, size, prob), but since it is not vectorized over the prob argument, it can only take a scalar as input, so I can't, but would be able to write:

dt[, response := rbinom(1, 1, responseProb)]

To give a simple example of what I mean, rbinom(1, 1, seq(0.1, 0.9, .1)), yields

> rbinom(1, 1, seq(0.1, 0.9, .1))
[1] 1

I think that the solution to this is to use

dt[, response := rbinom(probResponse, 1, responseProb)]

but want to double check that this would lead to the same answer as the first line of code.

Upvotes: 1

Views: 3499

Answers (1)

shadow
shadow

Reputation: 22293

So rbinom is vectorized and you can use .N as the first argument.

dt[, response := rbinom(.N, 1, responseProb)]

To check that this gives the same result as the indexing solution, just set a seed and repeat.

# create reproducible example
N <- 100
dt <- data.table(responseProb = runif(N), 
                 index = 1:N)
# set seed
set.seed(1)
# your original version
dt[, response := rbinom(1, 1, responseProb), by = index]
# set seed again
set.seed(1)
# version with .N
dt[, response2 := rbinom(.N, 1, responseProb)]
# check for equality
dt[, all(response == response2)]
## [1] TRUE

Upvotes: 3

Related Questions