naco
naco

Reputation: 373

How to block randomize data according more than just 1 parameter using R

I want to do block randomize my data into 3 arms with respect to both gender and smoking status as best as possible.

Here is some simulated data similar to my actual data. Note that males & females and smokers & non-smokers are unevenly sampled.

set.seed(33)
mydata <- data.frame("gender"=rep(c("female", "male"),  times=c(40,10)),
                 "smoker"=rep(c("yes", "no"), each=50),
                 "measurement"=rnorm(n=50, mean=15, sd=3),
                 "outcome of interest"= rep(c("positive", "negative"), times=c(20,30)))
head(mydata)
#     gender smoker measurement outcome.of.interest
# 1   female    yes   12.309256            positive
# 2   female    yes   15.554548            positive
# 3   female    yes   19.763536            positive
# 4   female    yes   11.608873            positive
# 5   female    yes   14.759245            positive
# 6   female    yes    15.39726            positive

I found the randomizr package useful for randomizing according to 1 variable, but I get unbalanced distribution of the other:

set.seed(2)
library(randomizr)
Z <- block_ra(blocks = mydata[,"gender"], num_arms = 3)
table(Z, mydata$gender)
# Z    female male
#   T1     26    7
#   T2     27    6
#   T3     27    7
table(Z, mydata$smoker)
# Z    no yes
#   T1 17  16
#   T2 13  20
#   T3 20  14

Z <- block_ra(blocks = mydata[,"smoker"], num_arms = 3)
table(Z, mydata$smoker)
# Z    no yes
#   T1 17  17
#   T2 17  16
#   T3 16  17
table(Z, mydata$gender)
# Z    female male
#   T1     29    5
#   T2     24    9
#   T3     27    6

How can I block randomize according to 2 or more parameters?

Upvotes: 1

Views: 215

Answers (1)

StupidWolf
StupidWolf

Reputation: 46898

You can try something like this, basically to group by gender and smoker first, and we randomize the order in which we assign 0,1,2.

For example, we use

SUBSET = subset(mydata,gender=="female" & smoker=="yes")

For each row number, we take the remainder after division by 3:

1:nrow(SUBSET) %% 3
 [1] 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2
[39] 0 1

We end up with an almost equal number of 0,1, and 2. We can randomize this assignment of 0,1,2 by doing

sample(1:nrow(SUBSET) %% 3)

You can use this approach in base R, with what @Dave2e proposed above, using a new column:

new = by(mydata,
paste(mydata$gender,mydata$smoker),
function(SUBSET){
SUBSET$id = sample(1:nrow(SUBSET) %% 3)
SUBSET
})
new = do.call(rbind,new)

You can also do a dplyr approach, in the same way, except to do sample(1:nrow(SUBSET) %% 3) , you need to use sample(1:n() %%3

set.seed(100)
library(dplyr)
new <- mydata %>% 
group_by(gender,smoker) %>% 
mutate(id=sample(1:n() %%3)) %>% 
ungroup()

And we can check the distribution in each arm:

by(new,new$id,function(i)table(i$gender,i$smoker))

new$id: 0

         no yes
  female 13  13
  male    3   3
------------------------------------------------------------ 
new$id: 1

         no yes
  female 14  14
  male    4   4
------------------------------------------------------------ 
new$id: 2

         no yes
  female 13  13
  male    3   3

Upvotes: 2

Related Questions