Select a random sample within levels of a factor, unequal stratum size per factor level

Question

I would like to select a random sample of my dataframe within levels of a factor. I can get a sample within the factor levels using ddply:

library(dplyr)
newdf <- ddply(iris, ~Species, function(x){
  ndf <- x[sample(nrow(x),2), ]
})
with(newdf,table(Species))

However, I don't want to simply sample 2 observations from each factor level. Rather, I want to sample, say 2,3 and 4 observations from within the 3 levels of Species (i.e. 2 from setosa, 3 from versicolor, 4 frm virginica). How can I do this?

Can I create a vector of values, e.g. c(2,3,4) to be cycled through with each dataframe split by ddply?

The values in that vector need to be specified - they are not a consistent proportion of all data, nor are they a consistent number.

akrun · Accepted Answer

We split the dataset by the 'Species', use Map to sample the number of observations, and rbind the list output.

 do.call(rbind, Map(function(x,y) x[sample(y),], split(iris, iris$Species), 2:4))

Select a random sample within levels of a factor, unequal stratum size per factor level

Answers (1)

Related Questions