Samet Sökel
Samet Sökel

Reputation: 2660

Randomly select sample from each group using weight

I'm aware there is sample_n function in dplyr but don't know how to pick a sample with weights.

For example;

iris %>%
group_by(Species) %>%
sample_n(size = 3)

this brings 30 observations from each group.

But I want to have 30 observation at total, and want this 30 sample to be %70 of group 1, %20 of group 2 and %10 of group 3 e.g.

Thanks in advance.

Upvotes: 0

Views: 534

Answers (1)

jpenzer
jpenzer

Reputation: 919

Borrowing from the link KoenV has posted in the comments:

library(dplyr)
library(purrr)

sample_size <- 30
groups <- c(0.7, 0.1, 0.2)
group_size <- sample_size * groups

iris %>%
  group_split(Species)%>%
  map2_dfr(group_size, ~ slice_sample(.x, n = .y))

# A tibble: 30 × 5
   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
          <dbl>       <dbl>        <dbl>       <dbl> <fct>  
 1          4.8         3.1          1.6         0.2 setosa 
 2          4.8         3.4          1.6         0.2 setosa 
 3          5.1         3.4          1.5         0.2 setosa 
 4          4.4         3            1.3         0.2 setosa 
 5          4.6         3.4          1.4         0.3 setosa 
 6          5.5         4.2          1.4         0.2 setosa 
 7          5.5         3.5          1.3         0.2 setosa 
 8          4.9         3            1.4         0.2 setosa 
 9          5.1         3.8          1.9         0.4 setosa 
10          5.7         4.4          1.5         0.4 setosa 

# A tibble: 3 × 2
  Species        n
  <fct>      <int>
1 setosa        21
2 versicolor     3
3 virginica      6

Upvotes: 1

Related Questions