Reputation: 420
I want to draw Y for each row in group g in a data.table. To illustrate, take this simple simulation:
set.seed(123)
N = 50
g = 3
DT = data.table(id = rep(1:N,g),
group_id = sort(rep(1:g, N)),
p = runif(150, min = 0, max = 1)
)
DT[, Y := rbinom(n = .N, size = 1, prob = p), by = group_id]
DTs = split(DT, by = "group_id")
DTs = rbindlist(DTs[1], idcol=T)
DTs[, Y := rbinom(n = .N, size = 1, prob = p)]
head(DT)
head(DTs)
Why does Y differ for DT and DTs? I thought this to be equivalent.
> head(DT)
id group_id p Y
1: 1 1 0.2875775 1
2: 2 1 0.7883051 1
3: 3 1 0.4089769 0
4: 4 1 0.8830174 1
5: 5 1 0.9404673 1
6: 6 1 0.0455565 0
> head(DTs)
.id id group_id p Y
1: 1 1 1 0.2875775 1
2: 1 2 1 0.7883051 1
3: 1 3 1 0.4089769 1
4: 1 4 1 0.8830174 1
5: 1 5 1 0.9404673 1
6: 1 6 1 0.0455565 0
Upvotes: 0
Views: 230
Reputation: 173928
The random number generator is shuffled every time you perform a random draw, so it is a different state at the point of the second binomial draw. If you want the same draw, you need to ensure the seed is set just before the draw:
set.seed(123)
N = 50
g = 3
DT = data.table(id = rep(1:N,g),
group_id = sort(rep(1:g, N)),
p = runif(150, min = 0, max = 1) # RNG changes state
)
# Specify seed for binomial draw
set.seed(456)
DT[, Y := rbinom(n = .N, size = 1, prob = p), by = group_id] # RNG changes state
DTs = split(DT, by = "group_id")
DTs = rbindlist(DTs[1], idcol=T)
# If we want the same draw, we need to set the seed to the same state
set.seed(456)
DTs[, Y := rbinom(n = .N, size = 1, prob = p)]
Giving you
head(DT)
#> id group_id p Y
#> 1: 1 1 0.2875775 0
#> 2: 2 1 0.7883051 1
#> 3: 3 1 0.4089769 1
#> 4: 4 1 0.8830174 1
#> 5: 5 1 0.9404673 1
#> 6: 6 1 0.0455565 0
head(DTs)
#> .id id group_id p Y
#> 1: 1 1 1 0.2875775 0
#> 2: 1 2 1 0.7883051 1
#> 3: 1 3 1 0.4089769 1
#> 4: 1 4 1 0.8830174 1
#> 5: 1 5 1 0.9404673 1
#> 6: 1 6 1 0.0455565 0
Upvotes: 1