Linear regression simulation

Question

Simulate the conditions of linear regression and show that the estimates for multidimensional linear regression (three or more parameters) are unbiased. Try to make biased estimates for the parameters of linear regression and show by simulations that you managed to achieve biasness.

This is what I tried, but I am stuck in getting unbiased estimates from biased estimates.

b0=2
b1=1.3
b2=5
N=1000
matrica=matrix(rep(0,N*3),ncol=3)
for (i in 1:N) {
      x1=rnorm(100) ##expectation and       variance is arbitrary
      x2=rnorm(100)
      err=rnorm(100)
      y=b0+b1*x1+b2*x2+err
     lm=lm(y~x1+x2)
     matrica[i,1]=lm$coefficients[1]
     matrica[i,2]=lm$coefficients[2]
     matrica[i,3]=lm$coefficients[3]
    }
   matrica


   rez1 <- matrica[1:N ,1]
   rez2 <- matrica[1:N ,2]
   rez3 <- matrica[1:N ,3]

   ## now we need to show that the      estimates are unbiased     (b0~mean(rez1...))
   summary(rez1)
   summary(rez2)
   summary(rez3) 

  cor(rez1,rez2)  #highly connected 
  cor(rez2,rez3)  #highly connected

MDEWITT · Accepted Answer

Ok so similar to the way you started you could do the following:

# True Values
b0=2
b1=1.3
b2=5

# Simulation Set Points
N=1000
n <- 100
set.seed(42)

collector <- matrix(ncol = 3,nrow = N)
colnames(collector) <- c("b0_hat", "b1_hat", "b2_hat")
for(i in 1:N){

  # Generate Data
  x1 <- rnorm(n, mean = 1, sd = 1)
  x2 <- rnorm(n, mean = 1, sd = 1)
  y_hat <- b1 * x1 + b2 * x2 + b0

  # Add Noise
  y <- rnorm(n, y_hat, 1)

  # Fit Data
  fit <- lm(y ~ x1 + x2)

  # Store Results
  collector[i, ] <- fit$coefficients
}

Then to show the results, you could plot the histograms and show that the mean of the estimates approach the true parameter value, beta.


# Graph to Show UnbiasedNess
par(mfrow = c(3,1))
hist(collector[,1], main =expression(hat(beta[0])),breaks = 30)
abline(v =b0, col = "red", lwd = 2)

hist(collector[,2], main =expression(hat(beta[1])),breaks = 30)
abline(v =b1, col = "red", lwd = 2)

hist(collector[,3], main =expression(hat(beta[2])),breaks = 30)
abline(v =b2, col = "red", lwd = 2)

Biased estimates refer to the idea that in the long run (i.e. the expected value) the parameter estimates are different than the true parameter value. One way to do this is to say that that instead of the errors being drawn from a normal distribution, they are drawn from a t-distribution.

# True Values
b0=2
b1=1.3
b2=5

# Simulation Set Points
N=1000
n <- 100
set.seed(42)

collector <- matrix(ncol = 3,nrow = N)
colnames(collector) <- c("b0_hat", "b1_hat", "b2_hat")
for(i in 1:N){

  # Generate Data
  x1 <- rnorm(n, mean = 1, sd = 1)
  x2 <- rnorm(n, mean = 1, sd = 1)
  y_hat <- b1 * x1 + b2 * x2 + b0

  # Add Noise from a t-distribution using `rt`
  y <- rt(n, df = 3, ncp = y_hat)

  # Fit Data
  fit <- lm(y ~ x1 + x2)

  # Store Results
  collector[i, ] <- fit$coefficients
}

Now you can see running the code below that there is a bias in our estimates.

# Graph to Show UnbiasedNess
par(mfrow = c(3,1))
hist(collector[,1], main =expression(hat(beta[0])),breaks = 30)
abline(v =b0, col = "red", lwd = 2)

hist(collector[,2], main =expression(hat(beta[1])),breaks = 30)
abline(v =b1, col = "red", lwd = 2)

hist(collector[,3], main =expression(hat(beta[2])),breaks = 30)
abline(v =b2, col = "red", lwd = 2)

Linear regression simulation

Answers (1)

Related Questions