Reputation: 1628
I'm following this question on extracting a random subset of rows.
My data look like:
scenario urban_areas_simple place population
North Primary Urban Areas Leeds 700,000
South Primary Urban Areas London 9,000,000
Scotland Rural Shetland 22,000
... ... ...
Using dplyr I have the following code, which works, and randomly selected 4 rows, based on conditions in my scenario
and urban_areas_simple
columns:
filter(lads,
scenario == "north" & urban_areas_simple == "Primary Urban Areas") %>%
sample_n(4)
However, I also want to randomised the number of rows selected, as here I've only arbitrarily selected 4 as an example.
How would I randomly select rows meeting these conditions, for subsets of a random size?
NB: there may only be between 10-50 rows meeting each condition.
Upvotes: 1
Views: 78
Reputation: 10776
filter(lads,
scenario == "north" & urban_areas_simple == "Primary Urban Areas") %>%
sample_frac(runif(1))
does just that.
The value is guaranteed to be returnable and it can handle stratified sampling from a grouped dataframe with unequal group sizes.
Upvotes: 0
Reputation: 70623
Instead of 4, you can use sample(1:100, size = 1)
. This will pick a random number between 1 and 100. If you want to make the process reproducible, stick a set.seed(x)
before you start using any function which depends on a random seed. x
is any integer.
Upvotes: 1