How to block randomize data according more than just 1 parameter using R

Question

I want to do block randomize my data into 3 arms with respect to both gender and smoking status as best as possible.

Here is some simulated data similar to my actual data. Note that males & females and smokers & non-smokers are unevenly sampled.

set.seed(33)
mydata <- data.frame("gender"=rep(c("female", "male"),  times=c(40,10)),
                 "smoker"=rep(c("yes", "no"), each=50),
                 "measurement"=rnorm(n=50, mean=15, sd=3),
                 "outcome of interest"= rep(c("positive", "negative"), times=c(20,30)))
head(mydata)
#     gender smoker measurement outcome.of.interest
# 1   female    yes   12.309256            positive
# 2   female    yes   15.554548            positive
# 3   female    yes   19.763536            positive
# 4   female    yes   11.608873            positive
# 5   female    yes   14.759245            positive
# 6   female    yes    15.39726            positive

I found the randomizr package useful for randomizing according to 1 variable, but I get unbalanced distribution of the other:

set.seed(2)
library(randomizr)
Z <- block_ra(blocks = mydata[,"gender"], num_arms = 3)
table(Z, mydata$gender)
# Z    female male
#   T1     26    7
#   T2     27    6
#   T3     27    7
table(Z, mydata$smoker)
# Z    no yes
#   T1 17  16
#   T2 13  20
#   T3 20  14

Z <- block_ra(blocks = mydata[,"smoker"], num_arms = 3)
table(Z, mydata$smoker)
# Z    no yes
#   T1 17  17
#   T2 17  16
#   T3 16  17
table(Z, mydata$gender)
# Z    female male
#   T1     29    5
#   T2     24    9
#   T3     27    6

How can I block randomize according to 2 or more parameters?

StupidWolf · Accepted Answer

You can try something like this, basically to group by gender and smoker first, and we randomize the order in which we assign 0,1,2.

For example, we use

SUBSET = subset(mydata,gender=="female" & smoker=="yes")

For each row number, we take the remainder after division by 3:

1:nrow(SUBSET) %% 3
 [1] 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2
[39] 0 1

We end up with an almost equal number of 0,1, and 2. We can randomize this assignment of 0,1,2 by doing

sample(1:nrow(SUBSET) %% 3)

You can use this approach in base R, with what @Dave2e proposed above, using a new column:

new = by(mydata,
paste(mydata$gender,mydata$smoker),
function(SUBSET){
SUBSET$id = sample(1:nrow(SUBSET) %% 3)
SUBSET
})
new = do.call(rbind,new)

You can also do a dplyr approach, in the same way, except to do sample(1:nrow(SUBSET) %% 3) , you need to use sample(1:n() %%3

set.seed(100)
library(dplyr)
new <- mydata %>% 
group_by(gender,smoker) %>% 
mutate(id=sample(1:n() %%3)) %>% 
ungroup()

And we can check the distribution in each arm:

by(new,new$id,function(i)table(i$gender,i$smoker))

new$id: 0

         no yes
  female 13  13
  male    3   3
------------------------------------------------------------ 
new$id: 1

         no yes
  female 14  14
  male    4   4
------------------------------------------------------------ 
new$id: 2

         no yes
  female 13  13
  male    3   3

How to block randomize data according more than just 1 parameter using R

Answers (1)

Related Questions