Reputation: 143
In R I have a data.table of the following structure:
DT <- data.table(M=c(1,2,3,4,5), N=c(2,3,1,1,4), mu=c(1,10,100,1000,10000), sigma=c(10,10,10,10,10))
Here M is the simulation number, N the number of observations an mu and sigma are the parameters for the normal distribution. According to the number of observations I want to generate random numbers out of a normal distribution with corresponding parameters mu and sigma. For example have a look at the second row: generating 3 random normal distributed numbers with mu=10 and sigma=10 by
rnorm(3,10,10)
These random normal distributed numbers shall be written in DT. For this I add as many columns to DT concerning the maximum of N by
DT[, paste0("X.", seq(1, max(DT[, N]))):=NA]
So for simulation M=3 I want to update only columns X.1, X.2 and X.3 by three random normal distributed numbers with mu=10 and sigma=10. But how can I do that with look on a really big data.table?
I have tried to solve this problem by a for-loop over the columns using the set-function
for (j in 5:ncol(DT)) {
X.random <- rnorm(n=DT[, N], mean=DT[, mu], sd=DT[, sigma])
set(DT, j=j, value=X.random)
}
But in this way the "condition" N, the number of observations, is not considered, because all columns X.1:X.4 are updated. Further on sometimes I am not sure, if the parameters are taken per row. How can I do that?
Edit: Without reading your answers I´ve updated the for-loop:
for (j in 5:ncol(DT)) {
idx <- which(DT[, N]-(j-4) >= 0)
X.random <- rnorm(n=DT[idx, N], mean=DT[idx, mu], sd=DT[idx, sigma])
set(DT, i=idx, j=j, value=X.random)
}
Unfortunately the set
-function doesn´t write random normal distributed numbers to corresponding columns X.1:X.4, only boolean values.
Upvotes: 2
Views: 686
Reputation: 52637
You can use dcast
:
dcast(
DT[, .(id=1:N, val=rnorm(N, mu, sigma)), by=.(M, N, mu, sigma)],
M + mu + sigma ~ id, value.var="val"
)
Produces:
M mu sigma 1 2 3 4
1: 1 1 10 -5.779204 -3.060535 NA NA
2: 2 10 10 13.070796 15.765328 12.30571 NA
3: 3 100 10 99.720755 NA NA NA
4: 4 1000 10 998.277712 NA NA NA
5: 5 10000 10 9999.507019 9997.459322 10010.48480 10003.36
Though really you should probably keep the data in long format (i.e. the first argument to dcast
above as data in that format is typically much more amenable to analysis).
Upvotes: 2
Reputation: 1569
I'm admittedly new to the data table world but this code seems to work (although it throws an error). I loop through the rows rather than the columns and assign a column name within the loop.
DT <- data.table(M=c(1,2,3,4,5), N=c(2,3,1,1,4), mu=c(1,10,100,1000,10000), sigma=c(10,10,10,10,10))
for (i in 1:nrow(DT)){
X.random <- rnorm(n=DT[i, N], mean=DT[i, mu], sd=DT[i, sigma])
j=paste0("X.", seq(1, DT[i, N]))
set(DT, i=i,j=j, value=X.random)
}
DT
M N mu sigma X.1 X.2 X.3 X.4
1: 1 2 1 10 -2.286063 -2.286063 NA NA
2: 2 3 10 10 13.843578 13.843578 13.84358 NA
3: 3 1 100 10 97.616599 NA NA NA
4: 4 1 1000 10 1014.386157 NA NA NA
5: 5 4 10000 10 9992.771152 9992.771152 9992.77115 9992.771
Upvotes: 0