kamath
kamath

Reputation: 143

R: Update columns in a data.table

In R I have a data.table of the following structure:

DT <- data.table(M=c(1,2,3,4,5), N=c(2,3,1,1,4), mu=c(1,10,100,1000,10000), sigma=c(10,10,10,10,10))

Here M is the simulation number, N the number of observations an mu and sigma are the parameters for the normal distribution. According to the number of observations I want to generate random numbers out of a normal distribution with corresponding parameters mu and sigma. For example have a look at the second row: generating 3 random normal distributed numbers with mu=10 and sigma=10 by

rnorm(3,10,10)

These random normal distributed numbers shall be written in DT. For this I add as many columns to DT concerning the maximum of N by

DT[, paste0("X.", seq(1, max(DT[, N]))):=NA]

So for simulation M=3 I want to update only columns X.1, X.2 and X.3 by three random normal distributed numbers with mu=10 and sigma=10. But how can I do that with look on a really big data.table?

I have tried to solve this problem by a for-loop over the columns using the set-function

for (j in 5:ncol(DT)) {
     X.random <- rnorm(n=DT[, N], mean=DT[, mu], sd=DT[, sigma])
     set(DT, j=j, value=X.random)
}

But in this way the "condition" N, the number of observations, is not considered, because all columns X.1:X.4 are updated. Further on sometimes I am not sure, if the parameters are taken per row. How can I do that?

Edit: Without reading your answers I´ve updated the for-loop:

for (j in 5:ncol(DT)) {
    idx <- which(DT[, N]-(j-4) >= 0)
    X.random <- rnorm(n=DT[idx, N], mean=DT[idx, mu], sd=DT[idx, sigma])
    set(DT, i=idx, j=j, value=X.random)
}

Unfortunately the set-function doesn´t write random normal distributed numbers to corresponding columns X.1:X.4, only boolean values.

Upvotes: 2

Views: 686

Answers (2)

BrodieG
BrodieG

Reputation: 52637

You can use dcast:

dcast(
  DT[, .(id=1:N, val=rnorm(N, mu, sigma)), by=.(M, N, mu, sigma)], 
  M + mu + sigma ~ id, value.var="val"
)

Produces:

   M    mu sigma           1           2           3        4
1: 1     1    10   -5.779204   -3.060535          NA       NA
2: 2    10    10   13.070796   15.765328    12.30571       NA
3: 3   100    10   99.720755          NA          NA       NA
4: 4  1000    10  998.277712          NA          NA       NA
5: 5 10000    10 9999.507019 9997.459322 10010.48480 10003.36

Though really you should probably keep the data in long format (i.e. the first argument to dcast above as data in that format is typically much more amenable to analysis).

Upvotes: 2

Jason
Jason

Reputation: 1569

I'm admittedly new to the data table world but this code seems to work (although it throws an error). I loop through the rows rather than the columns and assign a column name within the loop.

DT <- data.table(M=c(1,2,3,4,5), N=c(2,3,1,1,4), mu=c(1,10,100,1000,10000), sigma=c(10,10,10,10,10))


for (i in 1:nrow(DT)){
    X.random <- rnorm(n=DT[i, N], mean=DT[i, mu], sd=DT[i, sigma])
    j=paste0("X.", seq(1, DT[i, N]))
    set(DT, i=i,j=j, value=X.random)

}

DT

   M N    mu sigma         X.1         X.2        X.3      X.4
1: 1 2     1    10   -2.286063   -2.286063         NA       NA
2: 2 3    10    10   13.843578   13.843578   13.84358       NA
3: 3 1   100    10   97.616599          NA         NA       NA
4: 4 1  1000    10 1014.386157          NA         NA       NA
5: 5 4 10000    10 9992.771152 9992.771152 9992.77115 9992.771

Upvotes: 0

Related Questions