Estimating bias in linear regression and linear mixed model in R simulation

Question

I want to run simulations to estimate bias in linear model and linear mixed model. The bias is E(beta)-beta where beta is the association between my X and Y.

I generated my X variable from a normal distribution and Y from a multivariate normal distribution.

I understand how I can calculate E(beta) from simulations, which is the sum of beta estimates from all simulations divided by the total number of simulation, but I am not sure how I can estimate true beta.

meanY <- meanY + X*betaV

This is how I generated the meanY (betaV is the effect size) that is then used to generate multivariate Y outcome as shown below.

Y[jj,] <- rnorm(nRep, mean=meanY[jj], sd=sqrt(varY))

I understand how I can calculate E(beta) from simulations, which is the sum of beta estimates from all simulations divided by the total number of simulation, but I am not sure how I can estimate the true beta.

From my limited understanding, true beta is not obtained from the data but from the setting where I set fixed beta value.

Based on how I generated my data, how can I estimate the true beta?

Oliver · Accepted Answer

There are a couple of methods of simulating bias. I'll take an easy example using a linear model. A linear mixed model could likely use a similar approach, however i am not certain it would go well for a generalized linear mixed model (I am simply not certain).

A simple method for estimating bias, when working with a simple linear model, is to 'choose' which model to estimate ones bias from. Lets say for example Y = 3 + 4 * X + e. I have chosen beta <- c(3,4), and as such i need to only simulate my data. For a linear model, the model assumptions are

Observations are independent

Observations are normally distributed

The mean can be described as by the linear predictor

Using these 3 assumptions, simulating a fixed design is simple.

set.seed(1)
xseq <- seq(-10,10)
xlen <- length(xseq)
nrep <- 100
#Simulate X given a flat prior (uniformly distributed. A normal distribution would likely work fine as well)
X <- sample(xseq, size = xlen * nrep, replace = TRUE)
beta <- c(3, 4) 
esd = 1
emu <- 0
e <- rnorm(xlen * nrep, emu, esd)
Y <- cbind(1, X) %*% beta + e
fit <- lm(Y ~ X)
bias <- coef(fit) -beta

>bias
 (Intercept)            X 
0.0121017239 0.0001369908

which indicates a small bias. To test if this bias is significant, we could perform a wald-test or t-test (or replicate the process 1000 times, and check the distribution of outcomes).

#Simulate linear model many times
model_frame <- cbind(1,X) 
emany <- matrix(rnorm(xlen * nrep * 1000, emu, esd),ncol = 1000)
#add simulated noise. Sweep adds X %*% beta across all columns of emany
Ymany <- sweep(emany, 1, model_frame %*% beta, "+")
#fit many models simulationiously (lm is awesome!)
manyFits <- lm(Y~X)
#Plot density of fitted parameters
par(mfrow=c(1,2))
plot(density(coef(manyFits)[1,]), main = "Density of intercept")
plot(density(coef(manyFits)[2,]), main = "Density of beta")
#Calculate bias, here i use sweep to substract beta across all rows of my coefficients
biasOfMany <- rowMeans(sweep(coef(manyFits), 1, beta, "-"))

>biasOfMany
  (Intercept)             X 
 5.896473e-06 -1.710337e-04

Here we see that the bias is reduced quite a bit, and has changed sign for betaX giving reason to believe the bias is insignificant.

Changing the design would allow one to look into bias of interactions, outliers and other stuff using the same method.

For linear mixed models, one could perform the same method, however here you would have to design the random variables, which would require some more work, and the implementation of lmer as far as i know, does not fit a model across all columns of Y.

However b (the random effects) could be simulated, and so could any noise parameters. Do however note, that as b is a single vector containing a single outcome of simulations (often of a multivariate normal distribution), one would have to re-run the model for each simulation of b. Basically this will increase the number of times one would have to re-run the model fitting procedure, in order to get a good estimate of the bias.

Estimating bias in linear regression and linear mixed model in R simulation

Answers (1)

Related Questions