Reputation: 5646
I often need to write something like
sample_size = 10^4
my_data <- data.frame(x1 = runif(sample_size, 0,3), x2 = runif(sample_size, 0,3), x3 = runif(sample_size, 0,3), x4 = runif(sample_size, 0,3))
in order to test some statistical models. For example,
error <- rnorm(sample_size, 0, 0.1)
y <- with( my_data, 2*x1+0.1*(x2 + x3 + x4)) + error
my_model <- lm(y ~ ., data = my_data)
Since my_data
is used as input to lm
, it has to be a data frame (or a list).
I wonder if invoking runif
4 times is the right way to do this, or if there are better solutions. I tried
my_data <- matrix(4*runif(sample_size, 0,3), sample_size, 4, dimnames = list(NULL, paste0("x", 1:4)))
my_data <- as.data.frame(my_data)
But it doesn't seem so readable to me.
Upvotes: 0
Views: 69
Reputation: 145755
There are a few ways to do this. Let's say you want ncol
columns, here are some good ways:
ncol = 4
sample_size = 10
replicate(ncol, runif(sample_size, 0, 3))
matrix(runif(sample_size * ncol, 0, 3), ncol = ncol)
sapply(1:ncol, function(x) runif(sample_size, 0, 3))
These create matrices which you can, of course, convert to data frames as needed. The differences are minor. replicate
is essentially a nice wrapper for sapply
. The direct matrix
method may be slightly faster, but probably the difference is a few milliseconds.
Upvotes: 1