Reputation:
I would like to randomly select 1 case (so 1 row from a dataframe) from each group in R, but I cannot work out how to do it.
My data is structured in longformat: 400 cases (rows) clustered within 250 groups (some groups only contain a single case, others 2, 3, 4, 5, or even 6). So what I would like to end up with is a dataframe containing 250 rows (with each row representing 1 randomly selected case from the 250 different groups).
I have the idea that I should use the sample function for this, but I could work out how to do it. Anyone any ideas?
Upvotes: 2
Views: 108
Reputation: 2494
Suppose your data frame X
indicates group membership with a variable named "Group," as in this synthetic example:
G <- 8
set.seed(17)
X <- data.frame(Group=sort(sample.int(G, G, replace=TRUE)),
Case=1:G)
Here is a printout of X
:
Group Case 1 2 1 2 2 2 3 2 3 4 4 4 5 4 5 6 5 6 7 7 7 8 8 8
Pick up the first instance of each value of "Group" using the duplicated
function after randomly permuting the rows of X
:
Y <- X[sample.int(nrow(X)), ]
Y[!duplicated(Y$Group), ]
Group Case 8 8 8 1 2 1 4 4 4 6 5 6 7 7 7
A comparison to X
indicates random cases in each group were selected. Repeat these last two steps to confirm this if you like.
Upvotes: 1