Homer Jay Simpson
Homer Jay Simpson

Reputation: 1224

Sample in R a category with multiple levels and take a specific sample size from each category

I have the following dataset

it contains 3 character variables (A,B,C).

Variable A consists of 13 levels and i want to take a random sample of size n=30 in each category. The final dataset I want to contain all the samples in rows and the suited B.

I tried

data%>%
  group_by(B)%>%
  sample_n(size=30,replace = TRUE)

but it didn't work.Any help ?

Upvotes: 0

Views: 326

Answers (1)

Samet Sökel
Samet Sökel

Reputation: 2670

I checked your data, some groups you created (like DRH) has less than 30 observations. Since you passed the replace=TRUE argument, R duplicates if there is less than 30 observations. You can basicaly remove the argument (while keeping only greater than 30 sample groups) or follow this way;

grouped_data <- data %>%
group_by(GEAR) %>%
mutate(size=n())


grouped_data %>%
filter(size>=30) %>%
sample_n(size=30,replace = TRUE) %>%
ungroup %>%
select(-size) -> part_1

grouped_data %>%
filter(size<30) %>%
ungroup %>%
select(-size) -> part_2

I filtered the groups whose sample size greater than 30 then took random 30 observations from each of them in part_1. part_2 includes the groups whose sample size less than 30.

Upvotes: 2

Related Questions