Reputation: 6759
df <- data.frame(
id = c(1:12),
day = c(1, 1, 1,1, 2, 2,2, 2, 3,3,3,3),
endpoint = c(1, 1, 1,1, 2,2,2,2,1,1,1,1))
df
#> id day endpoint
#> 1 1 1 1
#> 2 2 1 1
#> 3 3 1 1
#> 4 4 1 1
#> 5 5 2 2
#> 6 6 2 2
#> 7 7 2 2
#> 8 8 2 2
#> 9 9 3 1
#> 10 10 3 1
#> 11 11 3 1
#> 12 12 3 1
In the above data, there some patients(id) reached the endpoint
each day
. I am trying to randomly select the endpoint
number of patients with s = 1
. For each day, id
s on that day and previously days are eligible as long as not previously selected. The following code gets what I expected, but I have to manually enter day
and endpoint
values. Any suggestions on how to pick those values directly from the data would be appreciated.
library(dplyr)
df$s = 0
df$s <-ifelse(df$id%in%sample_n(df[df$day<=1 & df$s==0, ], 1)$id, 1, df$s)
df$s <-ifelse(df$id%in%sample_n(df[df$day<=2 & df$s==0, ], 2)$id, 1, df$s)
df$s <-ifelse(df$id%in%sample_n(df[df$day<=3 & df$s==0, ], 1)$id, 1, df$s)
df
#> id day endpoint s pick_day
#> 1 1 1 1 0 0
#> 2 2 1 1 1 2
#> 3 3 1 1 1 1
#> 4 4 1 1 1 3
#> 5 5 2 2 1 2
#> 6 6 2 2 0 0
#> 7 7 2 2 0 0
#> 8 8 2 2 0 0
#> 9 9 3 1 0 0
#> 10 10 3 1 0 0
#> 11 11 3 1 0 0
#> 12 12 3 1 0 0
Is it possible to add a variable to show the day
for which a row was picked, like the above variable pick_day
? Thanks.
Upvotes: 1
Views: 85
Reputation: 388807
A way in base R using for
loop :
df$s = 0
set.seed(123)
for (i in unique(df$day)) {
temp <- subset(df, day <= i & s == 0)
ids <- with(temp, sample(id, endpoint[day == i][1]))
df$s[df$id %in% ids] <- 1
}
df
# id day endpoint s
#1 1 1 1 0
#2 2 1 1 0
#3 3 1 1 1
#4 4 1 1 1
#5 5 2 2 1
#6 6 2 2 0
#7 7 2 2 0
#8 8 2 2 1
#9 9 3 1 0
#10 10 3 1 0
#11 11 3 1 0
#12 12 3 1 0
Upvotes: 2