Reputation: 117
I have a dataset named by "Tree_all_exclusive" of 7607 rows and 39 column, which contains different information of tress such as age, height, name etc. I am able to create a sample of 1200 size with the below code, which looks picking trees randomly:
sam1<-sample_n(Tree_all_exclusive, size = 1200)
But I like to generate a proportionate stratified sample of 1200 trees which will pick the number of trees according to the proportion of the number of that specific type of tree.
To do this I am using below code:
sam3<-Tree_all_exclusive %>%
group_by(TaxonNameFull)%>%
summarise(total_numbers=n())%>%
arrange(-total_numbers)%>%
mutate(pro = total_numbers/7607)%>% #7607 total number of trees
mutate(sz= pro*1200)%>% #1200 is number of sample
mutate(siz=as.integer(sz)+1) #since some size is 0.01 so making it 1
sam3
s<-stratified(sam3, group="TaxonNameFull", sam3$siz)
But it is giving me the below error:
Error in s_n(indt, group, size) : 'size' should be entered as a named vector.
Would you please point me any direction to solve this issue?
Also if there is any other way to do the stratified sampling with proportionate number please guide me.
Thanks a lot.
Upvotes: 2
Views: 130
Reputation: 22044
How about using sample_frac()
:
library(dplyr)
data(mtcars)
mtcars %>%
group_by(cyl) %>%
tally()
#> # A tibble: 3 × 2
#> cyl n
#> <dbl> <int>
#> 1 4 11
#> 2 6 7
#> 3 8 14
mtcars %>%
group_by(cyl) %>%
sample_frac(.5) %>%
tally()
#> # A tibble: 3 × 2
#> cyl n
#> <dbl> <int>
#> 1 4 6
#> 2 6 4
#> 3 8 7
Created on 2023-01-24 by the reprex package (v2.0.1)
Upvotes: 1