Dan
Dan

Reputation: 71

R (Stratified) Random Sampling for Defined Cases

I have a data frame:

DF <- data.frame(Value = c("AB", "BC", "CD", "DE", "EF", "FG", "GH", "HI", "IJ", "JK", "KL", "LM"),
                 ID    = c(1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 0, 1))

My question: I would like to create a new column that includes a (binary) random number ('0' or '1') for cases 'ID' == 1 with a fixed proportion (or pre-defined prevalence) (e.g., random numbers '0' x 2 and '1' x 4).

EDIT I: For non-case specific purposes, the solution might be:

DF$RANDOM[sample(1:nrow(DF), nrow(DF), FALSE)] <- rep(RANDOM, c(nrow(DF)-4,4))

But, I still need the cas-specific assignment AND the aforementioned solution does not explicitly refer to '0' or '1'.

(Note: The variable 'value' is not relevant for the question; only an identifier.)

I figured out relevant posts on stratified sampling or random row selection - but this question is not covered by those (and other) posts.

Thank you VERY much in advance.

Upvotes: 0

Views: 348

Answers (2)

Yiran Wang
Yiran Wang

Reputation: 1

library(dplyr)
DF <- data.frame(Value = c("AB", "BC", "CD", "DE", "EF", "FG", "GH", 
                           "HI", "IJ", "JK", "KL", "LM"),
                 ID = c(1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 0, 1), 
                 stringsAsFactors = FALSE)
DF %>% group_by(ID) %>% sample_n(4, replace = FALSE)

Upvotes: 0

YOLO
YOLO

Reputation: 21709

You can subset the data first by case ID == 1. To ensure occurrence of 1s and 0s, we use rep function and set replace to False in sample function.
Here's a solution.

library(data.table)
set.seed(121)
DF[ID == 1, new_column := sample(rep(c(0,1), c(2,4)), .N, replace = F)]
print(DF1)

     Value ID new_column
 1:    AB  1          1
 2:    BC  0         NA
 3:    CD  0         NA
 4:    DE  1          1
 5:    EF  0         NA
 6:    FG  1          1
 7:    GH  1          1
 8:    HI  0         NA
 9:    IJ  0         NA
10:    JK  1          0
11:    KL  0         NA
12:    LM  1          0

Upvotes: 1

Related Questions