B. Davis
B. Davis

Reputation: 3441

sample a different number of random rows for each level of a factor with dplyr

I an trying to take a random sample from each level of a factor. There are a different number of observations for each factor level. For each level I want to create a sample with half as many observations.

library(dplyr)
dat <- data.frame(ID = rep(c("AAA", "AAA","AAA","BBB","BBB","CCC"), length = 100),
                  Value = sample(1:100, replace = T))

Using the data above, it seems like something like the following should nearly work, but the error (Error in n() : This function should not be called directly) suggests I am incorrectly using the n() function.

Samp <- dat %>% group_by(ID) %>% sample_n(size = n()/2 )

Thanks in advance.

Upvotes: 2

Views: 1185

Answers (1)

Bryan Goggin
Bryan Goggin

Reputation: 2489

Try sample_frac():

library(dplyr)
Samp <- dat %>% group_by(ID) %>% sample_frac(.5)

Upvotes: 6

Related Questions