Reza
Reza

Reputation: 319

Randomly pick rows from a data.frame based on a column value

I was wondering if there is away to make my expand.grid() output show unequal rows for each unique study value (currently, each unique study value has 4 rows)?

For example, can we randomly pick some or all rows of study == 1, some or all rows of study == 2, and some or all rows of study == 3?

The number of rows picked from each study is completely random.

This is toy study, a functional answer is appreciated.

library(dplyr)
(data <- expand.grid(study = 1:2, outcome = rep(1:2,2)))
arrange(data, study, outcome)
#   study outcome
#1      1       1
#2      1       1
#3      1       2
#4      1       2 #--- Up to here study == 1
#5      2       1
#6      2       1
#7      2       2
#8      2       2 #--- Up to here study == 2
#9      3       1
#10     3       1
#11     3       2
#12     3       2 #--- Up to here study == 3                      

Upvotes: 0

Views: 850

Answers (2)

Ronak Shah
Ronak Shah

Reputation: 389235

You can sample n() for each study and select 1 random value.

library(dplyr)

data %>% group_by(study) %>% sample_n(sample(n(), 1)) %>% ungroup

Upvotes: 1

Vin&#237;cius F&#233;lix
Vin&#237;cius F&#233;lix

Reputation: 8826

If I understood this should work

data %>% 
  #Grouping by the variable study
  group_by(study) %>% 
  #Sampling 3 observations for each study
  sample_n(size = 3)

Upvotes: 0

Related Questions