hachiko
hachiko

Reputation: 757

sample from dataframe, keeping all observations from sampled groups

I have a question on how to get a random sample but maintain multiple items that belong to the same group. What I'm really trying to do is do sampling, but each sample has to include every item.

Here is a method of sampling from mtcars. Using this, I get two random rows,

(sampled_df <- mtcars[sample(nrow(mtcars), 2), ])

I can take mtcars and then number it as though there are groups. mtcars has 32 observations. Here I'm saying that there are eight groups with four items each.

library(dplyr)

mtcars %>%
  mutate(number = rep(1:8,each=4)) %>%
  group_by(number) %>%
  sample_n(2)

The last two lines of code isn't doing what I'm hoping it would. I'm trying to have eight lines as output: all four of the observations from two of the groups.

I'm really working with invoice data and I want to be able to make the data frame smaller while making sure that I'm keeping the basket sizes the same.

Upvotes: 1

Views: 86

Answers (1)

deschen
deschen

Reputation: 11016

What you might want is:

mtcars %>%
  mutate(number = rep(1:8,each=4)) %>%
  filter(number %in% sample(1:8, 2))

Upvotes: 3

Related Questions