Reputation: 3568
I am trying to select all rows in a repeated measures dataset that belong to a randomly selected group of people. I am trying to do it entirely in the tidyverse
(for my own edification) but find myself having to fall back on base R functions. Here is how I do it with a combination of base R and dplyr
commands.
set.seed(145)
df <- data.frame(id = rep(letters[1:10], each = 4),
score = rnorm(40))
ids <- sample(unique(df$id), 3)
smallDF <- df %>% dplyr::filter(id %in% ids)
smallDF
# id score
# 1 a 0.6869129
# 2 a 1.0663631
# 3 a 0.5367006
# 4 a 1.9060287
# 5 c 1.1677516
# 6 c 0.7926794
# 7 c -1.2135038
# 8 c -1.0056141
# 9 d 0.2085696
# 10 d 0.4461776
# 11 d -0.6208060
# 12 d 0.4413429
I can sample randomly from the id
identifier using dplyr
...
df %>% distinct(id) %>% sample_n(3)
# id
# 1 e
# 2 c
# 3 b
...but the fact that the output is a dataframe/tibble is making it difficult for me to get to that next step where I then filter the original df
by the randomly selected id identifiers.
Can anyone help?
Upvotes: 3
Views: 601
Reputation: 389175
You can do a left_join
to original df
to get all the rows of randomly selected id's
library(dplyr)
set.seed(123)
df %>% distinct(id) %>% sample_n(3) %>% left_join(df)
#Joining, by = "id"
# id score
#1 b 1.063
#2 b 1.370
#3 b 0.528
#4 b 0.403
#5 f 0.343
#6 f -1.286
#7 f -0.534
#8 f 0.597
#9 c 1.168
#10 c 0.793
#11 c -1.214
#12 c -1.006
Upvotes: 3