Reputation: 1224
I have the following dataset
it contains 3 character variables (A,B,C).
Variable A consists of 13 levels and i want to take a random sample of size n=30 in each category. The final dataset I want to contain all the samples in rows and the suited B.
I tried
data%>%
group_by(B)%>%
sample_n(size=30,replace = TRUE)
but it didn't work.Any help ?
Upvotes: 0
Views: 326
Reputation: 2670
I checked your data, some groups you created (like DRH) has less than 30 observations. Since you passed the replace=TRUE
argument, R duplicates if there is less than 30 observations. You can basicaly remove the argument (while keeping only greater than 30 sample groups) or follow this way;
grouped_data <- data %>%
group_by(GEAR) %>%
mutate(size=n())
grouped_data %>%
filter(size>=30) %>%
sample_n(size=30,replace = TRUE) %>%
ungroup %>%
select(-size) -> part_1
grouped_data %>%
filter(size<30) %>%
ungroup %>%
select(-size) -> part_2
I filtered the groups whose sample size greater than 30 then took random 30 observations from each of them in part_1
. part_2
includes the groups whose sample size less than 30.
Upvotes: 2