Randomly pick rows from a data.frame based on a column value

Question

I was wondering if there is away to make my expand.grid() output show unequal rows for each unique study value (currently, each unique study value has 4 rows)?

For example, can we randomly pick some or all rows of study == 1, some or all rows of study == 2, and some or all rows of study == 3?

The number of rows picked from each study is completely random.

This is toy study, a functional answer is appreciated.

library(dplyr)
(data <- expand.grid(study = 1:2, outcome = rep(1:2,2)))
arrange(data, study, outcome)
#   study outcome
#1      1       1
#2      1       1
#3      1       2
#4      1       2 #--- Up to here study == 1
#5      2       1
#6      2       1
#7      2       2
#8      2       2 #--- Up to here study == 2
#9      3       1
#10     3       1
#11     3       2
#12     3       2 #--- Up to here study == 3

Ronak Shah · Accepted Answer

You can sample n() for each study and select 1 random value.

library(dplyr)

data %>% group_by(study) %>% sample_n(sample(n(), 1)) %>% ungroup

Randomly pick rows from a data.frame based on a column value

Answers (2)

Related Questions