Reputation: 373
I want to do block randomize my data into 3 arms with respect to both gender and smoking status as best as possible.
Here is some simulated data similar to my actual data. Note that males & females and smokers & non-smokers are unevenly sampled.
set.seed(33)
mydata <- data.frame("gender"=rep(c("female", "male"), times=c(40,10)),
"smoker"=rep(c("yes", "no"), each=50),
"measurement"=rnorm(n=50, mean=15, sd=3),
"outcome of interest"= rep(c("positive", "negative"), times=c(20,30)))
head(mydata)
# gender smoker measurement outcome.of.interest
# 1 female yes 12.309256 positive
# 2 female yes 15.554548 positive
# 3 female yes 19.763536 positive
# 4 female yes 11.608873 positive
# 5 female yes 14.759245 positive
# 6 female yes 15.39726 positive
I found the randomizr
package useful for randomizing according to 1 variable, but I get unbalanced distribution of the other:
set.seed(2)
library(randomizr)
Z <- block_ra(blocks = mydata[,"gender"], num_arms = 3)
table(Z, mydata$gender)
# Z female male
# T1 26 7
# T2 27 6
# T3 27 7
table(Z, mydata$smoker)
# Z no yes
# T1 17 16
# T2 13 20
# T3 20 14
Z <- block_ra(blocks = mydata[,"smoker"], num_arms = 3)
table(Z, mydata$smoker)
# Z no yes
# T1 17 17
# T2 17 16
# T3 16 17
table(Z, mydata$gender)
# Z female male
# T1 29 5
# T2 24 9
# T3 27 6
How can I block randomize according to 2 or more parameters?
Upvotes: 1
Views: 215
Reputation: 46898
You can try something like this, basically to group by gender and smoker first, and we randomize the order in which we assign 0,1,2.
For example, we use
SUBSET = subset(mydata,gender=="female" & smoker=="yes")
For each row number, we take the remainder after division by 3:
1:nrow(SUBSET) %% 3
[1] 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2
[39] 0 1
We end up with an almost equal number of 0,1, and 2. We can randomize this assignment of 0,1,2 by doing
sample(1:nrow(SUBSET) %% 3)
You can use this approach in base R, with what @Dave2e proposed above, using a new column:
new = by(mydata,
paste(mydata$gender,mydata$smoker),
function(SUBSET){
SUBSET$id = sample(1:nrow(SUBSET) %% 3)
SUBSET
})
new = do.call(rbind,new)
You can also do a dplyr approach, in the same way, except to do sample(1:nrow(SUBSET) %% 3)
, you need to use sample(1:n() %%3
set.seed(100)
library(dplyr)
new <- mydata %>%
group_by(gender,smoker) %>%
mutate(id=sample(1:n() %%3)) %>%
ungroup()
And we can check the distribution in each arm:
by(new,new$id,function(i)table(i$gender,i$smoker))
new$id: 0
no yes
female 13 13
male 3 3
------------------------------------------------------------
new$id: 1
no yes
female 14 14
male 4 4
------------------------------------------------------------
new$id: 2
no yes
female 13 13
male 3 3
Upvotes: 2