R: Update columns in a data.table

Question

In R I have a data.table of the following structure:

DT <- data.table(M=c(1,2,3,4,5), N=c(2,3,1,1,4), mu=c(1,10,100,1000,10000), sigma=c(10,10,10,10,10))

Here M is the simulation number, N the number of observations an mu and sigma are the parameters for the normal distribution. According to the number of observations I want to generate random numbers out of a normal distribution with corresponding parameters mu and sigma. For example have a look at the second row: generating 3 random normal distributed numbers with mu=10 and sigma=10 by

rnorm(3,10,10)

These random normal distributed numbers shall be written in DT. For this I add as many columns to DT concerning the maximum of N by

DT[, paste0("X.", seq(1, max(DT[, N]))):=NA]

So for simulation M=3 I want to update only columns X.1, X.2 and X.3 by three random normal distributed numbers with mu=10 and sigma=10. But how can I do that with look on a really big data.table?

I have tried to solve this problem by a for-loop over the columns using the set-function

for (j in 5:ncol(DT)) {
     X.random <- rnorm(n=DT[, N], mean=DT[, mu], sd=DT[, sigma])
     set(DT, j=j, value=X.random)
}

But in this way the "condition" N, the number of observations, is not considered, because all columns X.1:X.4 are updated. Further on sometimes I am not sure, if the parameters are taken per row. How can I do that?

Edit: Without reading your answers I´ve updated the for-loop:

for (j in 5:ncol(DT)) {
    idx <- which(DT[, N]-(j-4) >= 0)
    X.random <- rnorm(n=DT[idx, N], mean=DT[idx, mu], sd=DT[idx, sigma])
    set(DT, i=idx, j=j, value=X.random)
}

Unfortunately the set-function doesn´t write random normal distributed numbers to corresponding columns X.1:X.4, only boolean values.

BrodieG · Accepted Answer

You can use dcast:

dcast(
  DT[, .(id=1:N, val=rnorm(N, mu, sigma)), by=.(M, N, mu, sigma)], 
  M + mu + sigma ~ id, value.var="val"
)

Produces:

   M    mu sigma           1           2           3        4
1: 1     1    10   -5.779204   -3.060535          NA       NA
2: 2    10    10   13.070796   15.765328    12.30571       NA
3: 3   100    10   99.720755          NA          NA       NA
4: 4  1000    10  998.277712          NA          NA       NA
5: 5 10000    10 9999.507019 9997.459322 10010.48480 10003.36

Though really you should probably keep the data in long format (i.e. the first argument to dcast above as data in that format is typically much more amenable to analysis).

R: Update columns in a data.table

Answers (2)

Related Questions