Thirst for Knowledge
Thirst for Knowledge

Reputation: 1628

Selected randomly-sized, random subsets of rows

I'm following this question on extracting a random subset of rows.

My data look like:

scenario   urban_areas_simple       place      population
North       Primary Urban Areas     Leeds      700,000
South       Primary Urban Areas     London     9,000,000
Scotland    Rural                   Shetland   22,000
...         ...                     ...

Using dplyr I have the following code, which works, and randomly selected 4 rows, based on conditions in my scenario and urban_areas_simple columns:

filter(lads, 
    scenario == "north" & urban_areas_simple == "Primary Urban Areas") %>% 
    sample_n(4) 

However, I also want to randomised the number of rows selected, as here I've only arbitrarily selected 4 as an example.

How would I randomly select rows meeting these conditions, for subsets of a random size?

NB: there may only be between 10-50 rows meeting each condition.

Upvotes: 1

Views: 78

Answers (2)

Robin Gertenbach
Robin Gertenbach

Reputation: 10776

filter(lads, 
  scenario == "north" & urban_areas_simple == "Primary Urban Areas") %>% 
  sample_frac(runif(1)) 

does just that.

The value is guaranteed to be returnable and it can handle stratified sampling from a grouped dataframe with unequal group sizes.

Upvotes: 0

Roman Luštrik
Roman Luštrik

Reputation: 70623

Instead of 4, you can use sample(1:100, size = 1). This will pick a random number between 1 and 100. If you want to make the process reproducible, stick a set.seed(x) before you start using any function which depends on a random seed. x is any integer.

Upvotes: 1

Related Questions