Reputation: 2660
I'm aware there is sample_n
function in dplyr
but don't know how to pick a sample with weights.
For example;
iris %>%
group_by(Species) %>%
sample_n(size = 3)
this brings 30 observations from each group.
But I want to have 30 observation at total, and want this 30 sample to be %70 of group 1, %20 of group 2 and %10 of group 3 e.g.
Thanks in advance.
Upvotes: 0
Views: 534
Reputation: 919
Borrowing from the link KoenV has posted in the comments:
library(dplyr)
library(purrr)
sample_size <- 30
groups <- c(0.7, 0.1, 0.2)
group_size <- sample_size * groups
iris %>%
group_split(Species)%>%
map2_dfr(group_size, ~ slice_sample(.x, n = .y))
# A tibble: 30 × 5
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
<dbl> <dbl> <dbl> <dbl> <fct>
1 4.8 3.1 1.6 0.2 setosa
2 4.8 3.4 1.6 0.2 setosa
3 5.1 3.4 1.5 0.2 setosa
4 4.4 3 1.3 0.2 setosa
5 4.6 3.4 1.4 0.3 setosa
6 5.5 4.2 1.4 0.2 setosa
7 5.5 3.5 1.3 0.2 setosa
8 4.9 3 1.4 0.2 setosa
9 5.1 3.8 1.9 0.4 setosa
10 5.7 4.4 1.5 0.4 setosa
# A tibble: 3 × 2
Species n
<fct> <int>
1 setosa 21
2 versicolor 3
3 virginica 6
Upvotes: 1