Reputation: 71
I have a data frame:
DF <- data.frame(Value = c("AB", "BC", "CD", "DE", "EF", "FG", "GH", "HI", "IJ", "JK", "KL", "LM"),
ID = c(1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 0, 1))
My question: I would like to create a new column that includes a (binary) random number ('0
' or '1
') for cases 'ID' == 1
with a fixed proportion (or pre-defined prevalence) (e.g., random numbers '0
' x 2 and '1
' x 4).
EDIT I: For non-case specific purposes, the solution might be:
DF$RANDOM[sample(1:nrow(DF), nrow(DF), FALSE)] <- rep(RANDOM, c(nrow(DF)-4,4))
But, I still need the cas-specific assignment AND the aforementioned solution does not explicitly refer to '0
' or '1
'.
(Note: The variable 'value
' is not relevant for the question; only an identifier.)
I figured out relevant posts on stratified sampling or random row selection - but this question is not covered by those (and other) posts.
Thank you VERY much in advance.
Upvotes: 0
Views: 348
Reputation: 1
library(dplyr)
DF <- data.frame(Value = c("AB", "BC", "CD", "DE", "EF", "FG", "GH",
"HI", "IJ", "JK", "KL", "LM"),
ID = c(1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 0, 1),
stringsAsFactors = FALSE)
DF %>% group_by(ID) %>% sample_n(4, replace = FALSE)
Upvotes: 0
Reputation: 21709
You can subset the data first by case ID == 1
. To ensure occurrence of 1s and 0s, we use rep
function and set replace
to False in sample
function.
Here's a solution.
library(data.table)
set.seed(121)
DF[ID == 1, new_column := sample(rep(c(0,1), c(2,4)), .N, replace = F)]
print(DF1)
Value ID new_column
1: AB 1 1
2: BC 0 NA
3: CD 0 NA
4: DE 1 1
5: EF 0 NA
6: FG 1 1
7: GH 1 1
8: HI 0 NA
9: IJ 0 NA
10: JK 1 0
11: KL 0 NA
12: LM 1 0
Upvotes: 1