Reputation: 726
Say I have a data frame that is grouped by 2 factors. Is there a way to sample groups of data with dplyr
? (note: not sample within groups)
example:
DF <- data.frame(A = rep(LETTERS[1:4], each = 6),
B = rep(c(1:2), 12),
C = rnorm(24))
# base r solution
DF$group_var <- paste(DF$A, DF$B, sep = "_")
DF_sample <- DF[DF$group_var %in% sample(unique(DF$group_var), 3), ]
#possible dplyr solution?
DF_sample <- DF %>% group_by(A,B) %>% sample_group_of_data(3)
Upvotes: 3
Views: 688
Reputation: 155
I found Vincent's solution in the comments to be the one I needed. I am adding it as an additional answer.
DF %>% filter(group_var %in% sample(unique(DF$group_var), 3, replace = F))
Vincent, I owe you a +1.
Upvotes: 3
Reputation: 70266
Here's another pipe-solution, it works irrespective of whether the data is grouped or not:
DF %>% split(interaction(.$A, .$B)) %>% sample(3) %>% bind_rows()
# Source: local data frame [9 x 3]
#
# A B C
# (fctr) (int) (dbl)
# 1 B 1 0.2623781
# 2 B 1 -0.8193225
# 3 B 1 0.3348400
# 4 D 1 1.0744650
# 5 D 1 1.3528529
# 6 D 1 0.3016770
# 7 A 2 -0.1920754
# 8 A 2 0.6917583
# 9 A 2 0.1985326
The pipe itself is pretty self-explanatory, I believe.
Upvotes: 4
Reputation: 680
Probably not as pretty as you would have wanted and it's kind of cheating but, here's my solution:
DF %>% group_by(A, B) %>%
magrittr::extract(unlist(sample(attr(., "indices"), 5))+1, )
I use the "indices" attribute which gives the indices of the groups in a list. I sample this list, unlist it and add 1 (it seems those indices start at 0).
I then use the magrittr
extract function which stands for the []
operator. In that sense I am kind of cheating as effectively, I have just rewritten with the pipe your problem and using the attributes from the grouped dplyr.
Upvotes: 2