How to simulate MAR missing data in R?

Question

I would like to simulate some missing data in R but am having trouble. I have created two variables ('pre' and 'post') that represent a measurement for the same individual pre- and post-treatment (i.e. paired data). I have been able to do it for data that is Missing Completely at Random (MCAR) - see below, but am unable to figure out how to code it for Missing at Random (MAR). For the MAR missing data, I would like to create 3 categories based on the pre-treatment observations that will determine how many of the post-treatment observations are missing. i.e.

For pre > 25, 40% post missing
For pre > 21 and ≤ 25, 30% post missing
For pre ≤ 21, 20% post missing

Can anyone help out? (I'd be really grateful!)
Thanks

set.seed(80122)
n <- 1000

# Simulate 1000 people with high pre-treatment (mean 28, sd 3) and normal (mean 18, sd 3) post-treatment. Correlation between paired data = 0.7.
data <- rmvnorm(n,mean=c(28,18),sigma=matrix(c(9,0.7*sqrt(81),0.7*sqrt(81),9),2,2)) # Covariance matrix

# Split into pre and post treatment and check correlation is what was specified
pre <- data[, 1]
post <- data[, 2]
cor.test(pre,post)

# Simulate MCAR
mcar <- 1 - rbinom(n, 1, 0.2) # Will create ~ 20% zero's which we'll convert to NA's
post_mcar <- post
post_mcar[mcar == 0] <- mcar[mcar==0] # Replace post data with random zero's from mcar vector
post_mcar[mcar == 0] <- NA # Change zero's to NAs

How to simulate MAR missing data in R?

Answers (1)

Related Questions