Reputation: 79
I would like to select a random sample of my dataframe within levels of a factor. I can get a sample within the factor levels using ddply:
library(dplyr)
newdf <- ddply(iris, ~Species, function(x){
ndf <- x[sample(nrow(x),2), ]
})
with(newdf,table(Species))
However, I don't want to simply sample 2 observations from each factor level. Rather, I want to sample, say 2,3 and 4 observations from within the 3 levels of Species (i.e. 2 from setosa, 3 from versicolor, 4 frm virginica). How can I do this?
Can I create a vector of values, e.g. c(2,3,4) to be cycled through with each dataframe split by ddply?
The values in that vector need to be specified - they are not a consistent proportion of all data, nor are they a consistent number.
Upvotes: 2
Views: 2684
Reputation: 887691
We split
the dataset by the 'Species', use Map
to sample
the number of observations, and rbind
the list
output.
do.call(rbind, Map(function(x,y) x[sample(y),], split(iris, iris$Species), 2:4))
Upvotes: 2