Sample replication

Question

I have a data frame (d) composed of 640 observations for 55 variables.

I would like to randomly sample this data frame in 10 sub data frame of 64 observations for 55 variables. I don't want any of the observation to be in more than one sub data-frame.

This code work for one sample

d1 <- d[sample(nrow(d),64,replace=F),]

How can I repeat this treatment ten times ?

This one give me a data-frame of 10 variables (each one is one sample...)

d1 <- replicate(10,sample(nrow(d),64,replace = F))}

Can anyone help me?

gagolews · Accepted Answer

Here's a solution that returns the result in a list of data.frames:

d <- data.frame(A=1:640, B=sample(LETTERS, 640, replace=TRUE)) # an exemplary data.frame
idx <- sample(rep(1:10, length.out=nrow(d)))
res <- split(d, idx)
res[[1]] # first data frame
res[[10]] # last data frame

The only tricky part involves creating idx. idx[i] identifies the resulting data.frame, idx[i] in {1,...,10}, in which the ith row of d will occur. Such an approach assures us that no row will be put into more than 1 data.frame.

Also, note that sample returns a random permutation of (1,2,...,10,1,2,...,10).

Another approach is to use:

apply(matrix(sample(nrow(d)), ncol=10), 2, function(idx) d[idx,])

Sample replication

Answers (1)

Related Questions