TBP
TBP

Reputation: 747

Simulating data from a data frame using ddply

I have some plant data almost identical to the 'iris' data set. I would like to simulate new data using a normal distribution. So for each variable~species in the iris data set I would create 10 new observations from a normal distribution. Basically it would just create a new data frame with the same structure as the old one, but it would contain simulated data. I feel that the following code should get me started (I think the data frame would be in the wrong form), but it will not run.

ddply(iris, c("Species"), function(x) data.frame(rnorm(n=10, mean=mean(x), sd=sd(x))))

rnorm is returning an atomic vector so ddply should be able to handle it.

Upvotes: 0

Views: 172

Answers (1)

MrFlick
MrFlick

Reputation: 206167

the ddply will subset the rows by Species, but you're doing nothing in the function to iterate over the columns of the sub-setting data.frame. You cannot get norm() to return a list or data.frame for you; you will need to assist with the shaping. How about

ddply(iris, c("Species"), function(x) {
    data.frame(lapply(x[,1:4], function(y) rnorm(10, mean(y), sd(y))))
})

here we calculate new values for the first 4 columns in each group.

Upvotes: 1

Related Questions