llewmills
llewmills

Reputation: 3568

Selecting all rows that match a criteria selected randomly within dplyr

I am trying to select all rows in a repeated measures dataset that belong to a randomly selected group of people. I am trying to do it entirely in the tidyverse (for my own edification) but find myself having to fall back on base R functions. Here is how I do it with a combination of base R and dplyr commands.

set.seed(145)
df <- data.frame(id = rep(letters[1:10], each = 4),
                 score = rnorm(40))
ids <- sample(unique(df$id), 3)
smallDF <- df %>% dplyr::filter(id %in% ids)
smallDF

#    id      score
# 1   a  0.6869129
# 2   a  1.0663631
# 3   a  0.5367006
# 4   a  1.9060287
# 5   c  1.1677516
# 6   c  0.7926794
# 7   c -1.2135038
# 8   c -1.0056141
# 9   d  0.2085696
# 10  d  0.4461776
# 11  d -0.6208060
# 12  d  0.4413429

I can sample randomly from the id identifier using dplyr...

df %>% distinct(id) %>% sample_n(3)

#   id
# 1  e
# 2  c
# 3  b

...but the fact that the output is a dataframe/tibble is making it difficult for me to get to that next step where I then filter the original df by the randomly selected id identifiers.

Can anyone help?

Upvotes: 3

Views: 601

Answers (2)

Lala La
Lala La

Reputation: 1452

df %>% filter(id %in% sample(levels(id),3))

Upvotes: 1

Ronak Shah
Ronak Shah

Reputation: 389175

You can do a left_join to original df to get all the rows of randomly selected id's

library(dplyr)
set.seed(123)
df %>% distinct(id) %>% sample_n(3) %>% left_join(df)

#Joining, by = "id"
#   id  score
#1   b  1.063
#2   b  1.370
#3   b  0.528
#4   b  0.403
#5   f  0.343
#6   f -1.286
#7   f -0.534
#8   f  0.597
#9   c  1.168
#10  c  0.793
#11  c -1.214
#12  c -1.006

Upvotes: 3

Related Questions