Noor van Sprang
Noor van Sprang

Reputation:

How to create a dataframe by sampling 1 case (row) from each group in R

I would like to randomly select 1 case (so 1 row from a dataframe) from each group in R, but I cannot work out how to do it.

My data is structured in longformat: 400 cases (rows) clustered within 250 groups (some groups only contain a single case, others 2, 3, 4, 5, or even 6). So what I would like to end up with is a dataframe containing 250 rows (with each row representing 1 randomly selected case from the 250 different groups).

I have the idea that I should use the sample function for this, but I could work out how to do it. Anyone any ideas?

Upvotes: 2

Views: 108

Answers (1)

whuber
whuber

Reputation: 2494

Suppose your data frame X indicates group membership with a variable named "Group," as in this synthetic example:

G <- 8
set.seed(17)
X <- data.frame(Group=sort(sample.int(G, G, replace=TRUE)),
                Case=1:G)

Here is a printout of X:

  Group Case
1     2    1
2     2    2
3     2    3
4     4    4
5     4    5
6     5    6
7     7    7
8     8    8

Pick up the first instance of each value of "Group" using the duplicated function after randomly permuting the rows of X:

Y <- X[sample.int(nrow(X)), ]
Y[!duplicated(Y$Group), ]
  Group Case
8     8    8
1     2    1
4     4    4
6     5    6
7     7    7

A comparison to X indicates random cases in each group were selected. Repeat these last two steps to confirm this if you like.

Upvotes: 1

Related Questions