Ricky.L
Ricky.L

Reputation: 47

Plotting after Doing Simulation of Linear Regression with R

I am doing the simulation of linear regression with R.

A regression model I consider is y_i = a + b_1 * x_1i + b_2 * x_2i + e_i.

The parameter design as follows:

x_1i ~ N(2,1), x_2i ~ Poisson(4), e_i ~ N(0, 1), theta = (a, b_1, b_2)

The following code I am doing is that I would like to generate 100 independent random samples of (y, x_1, x_2) 1000 times using the distribution which I have mentioned above, and I also want to estimate theta_hat (the estimator of theta). After getting the theta_hat, I would like to plot the distribution of estimator of a (a_hat), b_1 (b_1_hat), b_2 (b_2_hat), respectively.

## Construct 1000 x_1
x_1_1000 <- as.data.frame(replicate(n = 1000,expr = rnorm(n = 100, 
  mean = 2, sd = 1)))
colnames(x_1_1000) <- paste("x_1", 1:1000, sep = "_")

x_2_1000 <- as.data.frame(replicate(n = 1000,expr = rpois(n = 100, 
  lambda = 4)))
colnames(x_2_1000) <- paste("x_2", 1:1000, sep = "_")

error_1000 <- as.data.frame(replicate(n = 1000, expr = rnorm(n = 100,
  mean = 0, sd = 1)))
colnames(error_1000) <- paste("e", 1:1000, sep = "_")

y_1000 <- as.data.frame(matrix(data = 0, nrow = 100, ncol = 1000))
y_1000 = 1 + x_1_1000 * 1 + x_2_1000*(-2) + error_1000
colnames(y_1000) <- paste("y", 1:1000, sep = "_")
######################################################################
lms <- lapply(1:1000, function(x) lm(y_1000[,x] ~ x_1_1000[,x] + x_2_1000[,x]))
theta_hat_1000 <- as.data.frame(sapply(lms, coef))

After doing the linear regression, I just store the result into lms which is a list. Because I just want the data of coefficient, I store those simulation coefficients into "theta_hat_1000" However, when I wanna plot the distribution graph, I cannot get what I want in the final. I have tried two ways to solve the problem but still being confused.

The first way I tried is that I just rename the data frame "theta_hat_1000". I have successfully renamed the column_i, where i from 1 to 1000. However, I just cannot successfully rename the rows.

rownames(theta_hat_1000[1,]) <- "ahat"
rownames(theta_hat_1000[2,]) <- "x1hat"
rownames(theta_hat_1000[3,]) <- "x2hat"

The code listed above did not show any error message but finally failed to change the row names. Thus, I have tried the following code

rownames(theta_hat_1000) <- c("ahat", "x1hat", "x2hat")

This has successfully renamed. However, when I want to check there is anything store in the data frame, it reports "NULL"

theta_hat_1000$ahat

NULL

Therefore, I notice that there is something weird. Thus I have tried the second way like the following.

I have tried to unlist "theta_hat_1000" which is a list stored in my global environment. However, after doing such things, I am not getting what I want. The expected result is just getting three rows and each row with 1000 values, but the actual is that I got 3000 obs with 1 column.

The ideal result is getting three columns and each column with 1000 values and putting them into a data frame to do a further process like using ggplot to demonstrate the distribution of estimated coefficients.

I have stuck on this for several hours. It would be appreciated if anyone can help me and give me some suggestions.

Upvotes: 0

Views: 198

Answers (1)

AndreasM
AndreasM

Reputation: 952

This line theta_hat_1000$ahat in your code does not work, because "ahat" is a rowname not a column name in the data frame. You would get the result by calling theta_hat_1000["ahat",].

However, I understand that your desired result is actually a dataframe with 3 columns (and 1000 rows) representing the 3 parameters of your regression model (intercept, x1, x2). This line in your code as.data.frame(sapply(lms, coef)) produces a dataframe with 3 rows and 1000 columns. You can, for instance, transpose the matrix before changing it into a data frame to get 1000 rows and 3 columns.

theta_hat_1000 <- sapply(lms, coef)
theta_hat_1000 <- as.data.frame(t(theta_hat_1000))
colnames(theta_hat_1000) <- c("ahat", "x1hat", "x2hat")

head(theta_hat_1000)
       ahat     x1hat     x2hat
1 2.0259326 0.7417404 -2.111874
2 0.7827929 0.9437324 -1.944320
3 1.1034906 1.0091594 -2.035405
4 0.9677150 0.8168757 -1.905367
5 1.0518646 0.9616123 -1.985357
6 0.8600449 1.0781489 -2.017061

Now you could also call the variables with theta_hat_1000$ahat.

Upvotes: 1

Related Questions