Reputation: 347
I have the following table on the abundance of two species in groups:
df <- tribble(~name, ~id, ~freq, ~toselect,
"spA", 22, 10, 4,
"spA", 23, 10, 4,
"spA", 21, 8, 4,
"spA", 19, 6, 4,
"spA", 25, 5, 4,
"spA", 26, 4, 4,
"spA", 27, 4, 4,
"spA", 28, 3, 4,
"spA", 29, 3, 4,
"spA", 24, 2, 4,
"spA", 30, 2, 4,
"spA", 20, 1, 4,
"spA", 31, 1, 4,
"spA", 33, 1, 4,
"spB", 27, 9, 2,
"spB", 28, 1, 2,
"spB", 29, 1, 2,
"spB", 24, 1, 2,
"spB", 30, 1, 2,
"spB", 20, 1, 2,
"spB", 31, 1, 2,
"spB", 33, 1, 2)
I want to select n rows, where n is included as a species specific parameter in the tibble (col "toselect"). However, I want to select those rows based on the frequency of the species in particular group (col "freq"), i.e. duplicates are ok and wanted (e.g. in the case of spB I actually want the algorithm to select the group 27 twice.).
I actually faced two issues. The traditional sample_n()
, works well for the selection of desired number of rows.
df %>% group_by(name) %>%
sample_n(toselect[1], replace = T)
The other option I thought of is its successor slice_sample()
. This is a cool function and works well with duplicates. However, does not work with different number of selected rows per individual groups.
df %>% group_by(name) %>%
slice_sample(n = 4, replace = T) # instead of 4 I would like to put there "toselect[1]"
Lastly, none of these two options work for proportional selection. I tried adding the argument weight = freq
, but this still produces a random selection. Therefore I ask: is there a way how to do it?
Upvotes: 0
Views: 127
Reputation: 8484
Unfortunately, the n
argument of slice_sample()
and sample_n() is not vectorized.
Therefore, you have to use a loop-like function to achieve this.
Here, I use a combination of dplyr::group_split()
and purrr::map_dfr()
:
library(tidyverse)
set.seed(0)
df %>%
group_split(name) %>%
map_dfr(~{
sample_n(.x, toselect[1], replace = T)
})
#> # A tibble: 6 x 4
#> name id freq toselect
#> <chr> <dbl> <dbl> <dbl>
#> 1 spA 33 1 4
#> 2 spA 29 3 4
#> 3 spA 19 6 4
#> 4 spA 27 4 4
#> 5 spB 27 9 2
#> 6 spB 28 1 2
Created on 2021-05-15 by the reprex package (v2.0.0)
Upvotes: 1