Reputation: 195
Simulate the conditions of linear regression and show that the estimates for multidimensional linear regression (three or more parameters) are unbiased. Try to make biased estimates for the parameters of linear regression and show by simulations that you managed to achieve biasness.
This is what I tried, but I am stuck in getting unbiased estimates from biased estimates.
b0=2
b1=1.3
b2=5
N=1000
matrica=matrix(rep(0,N*3),ncol=3)
for (i in 1:N) {
x1=rnorm(100) ##expectation and variance is arbitrary
x2=rnorm(100)
err=rnorm(100)
y=b0+b1*x1+b2*x2+err
lm=lm(y~x1+x2)
matrica[i,1]=lm$coefficients[1]
matrica[i,2]=lm$coefficients[2]
matrica[i,3]=lm$coefficients[3]
}
matrica
rez1 <- matrica[1:N ,1]
rez2 <- matrica[1:N ,2]
rez3 <- matrica[1:N ,3]
## now we need to show that the estimates are unbiased (b0~mean(rez1...))
summary(rez1)
summary(rez2)
summary(rez3)
cor(rez1,rez2) #highly connected
cor(rez2,rez3) #highly connected
Upvotes: 2
Views: 680
Reputation: 2368
Ok so similar to the way you started you could do the following:
# True Values
b0=2
b1=1.3
b2=5
# Simulation Set Points
N=1000
n <- 100
set.seed(42)
collector <- matrix(ncol = 3,nrow = N)
colnames(collector) <- c("b0_hat", "b1_hat", "b2_hat")
for(i in 1:N){
# Generate Data
x1 <- rnorm(n, mean = 1, sd = 1)
x2 <- rnorm(n, mean = 1, sd = 1)
y_hat <- b1 * x1 + b2 * x2 + b0
# Add Noise
y <- rnorm(n, y_hat, 1)
# Fit Data
fit <- lm(y ~ x1 + x2)
# Store Results
collector[i, ] <- fit$coefficients
}
Then to show the results, you could plot the histograms and show that the mean of the estimates approach the true parameter value, beta.
# Graph to Show UnbiasedNess
par(mfrow = c(3,1))
hist(collector[,1], main =expression(hat(beta[0])),breaks = 30)
abline(v =b0, col = "red", lwd = 2)
hist(collector[,2], main =expression(hat(beta[1])),breaks = 30)
abline(v =b1, col = "red", lwd = 2)
hist(collector[,3], main =expression(hat(beta[2])),breaks = 30)
abline(v =b2, col = "red", lwd = 2)
Biased estimates refer to the idea that in the long run (i.e. the expected value) the parameter estimates are different than the true parameter value. One way to do this is to say that that instead of the errors being drawn from a normal distribution, they are drawn from a t-distribution.
# True Values
b0=2
b1=1.3
b2=5
# Simulation Set Points
N=1000
n <- 100
set.seed(42)
collector <- matrix(ncol = 3,nrow = N)
colnames(collector) <- c("b0_hat", "b1_hat", "b2_hat")
for(i in 1:N){
# Generate Data
x1 <- rnorm(n, mean = 1, sd = 1)
x2 <- rnorm(n, mean = 1, sd = 1)
y_hat <- b1 * x1 + b2 * x2 + b0
# Add Noise from a t-distribution using `rt`
y <- rt(n, df = 3, ncp = y_hat)
# Fit Data
fit <- lm(y ~ x1 + x2)
# Store Results
collector[i, ] <- fit$coefficients
}
Now you can see running the code below that there is a bias in our estimates.
# Graph to Show UnbiasedNess
par(mfrow = c(3,1))
hist(collector[,1], main =expression(hat(beta[0])),breaks = 30)
abline(v =b0, col = "red", lwd = 2)
hist(collector[,2], main =expression(hat(beta[1])),breaks = 30)
abline(v =b1, col = "red", lwd = 2)
hist(collector[,3], main =expression(hat(beta[2])),breaks = 30)
abline(v =b2, col = "red", lwd = 2)
Upvotes: 1