Gritti
Gritti

Reputation: 85

aggregating regression outputs in R

I am performing multiple pooled cross-sectional regressions with a loop function and stored the regression outputs in a list (regression). What i would like to do now is to efficiently obtain the average coefficients, the average t-stats as well as the average adj.r squared.

I put up the following code already:

library(plm)
data("Grunfeld", package="plm")

# create list with regression outputs
regression <- list()

# Regression on past six-year subsets of Grunfeld in every year from 1940 to 1950
for(t in 1940:1950){

  regression[[as.character(t)]] <- lm(inv ~ value + capital, 
                              subset(Grunfeld, year<=t & year>=t-5))
}

This way I obtain the desired regression output stored in a list (regression). What i would like to do now is to efficiently obtain the average coefficients, the average t-stats as well as the average adj.r squared.

I already tried to calculated the mean of all the adj. r squared with the following:

mean(lapply(regression, function(x) summary(x)$adj.r.squared))

However it seems that I am using the mean function wrong as i get the following error.

Warning message:
In mean.default(lapply(regression, function(x) summary(x)$adj.r.squared)) :
  argument is not numeric or logical: returning NA

Further i came up with the following to "extract" the coefficients.

lapply(regression, function(x) summary(x)$coefficients)

How can I quickly obtain the average individual coefficients from this lapply output? (i.e extracting each row individually and calculate the respective mean over the years.)

$`1940`
               Estimate   Std. Error    t value     Pr(>|t|)
(Intercept) -3.65239712 14.647050149 -0.2493606 8.039783e-01
value        0.08283141  0.006873563 12.0507230 2.615793e-17
capital      0.11033307  0.091543522  1.2052526 2.330857e-01

$`1941`
                Estimate   Std. Error   t value     Pr(>|t|)
(Intercept) -13.77258038 16.677399231 -0.825823 4.123477e-01
value         0.08614094  0.007258571 11.867480 4.904857e-17
capital       0.18680229  0.094849038  1.969470 5.376624e-02

....

Upvotes: 1

Views: 513

Answers (2)

akrun
akrun

Reputation: 887158

You could try:

 library(reshape2)
  dcast(melt(lapply(regression, 
       function(x) summary(x)$coefficients)), Var1~Var2, value.var="value", mean)
 #         Var1    Estimate Std. Error   t value     Pr(>|t|)
 #1 (Intercept) -16.7072859 16.0876958 -1.029145 3.320868e-01
 #2       value   0.1107460  0.0076057 14.599109 1.510115e-17
 #3     capital   0.1279743  0.0685896  1.833861 9.389504e-02

Or

 Reduce(`+`,lapply(regression, function(x) summary(x)$coefficients))/length(regression)
 #                    Estimate Std. Error   t value     Pr(>|t|)
 #(Intercept) -16.7072859 16.0876958 -1.029145 3.320868e-01
 #value         0.1107460  0.0076057 14.599109 1.510115e-17
 #capital       0.1279743  0.0685896  1.833861 9.389504e-02

Upvotes: 2

landroni
landroni

Reputation: 2988

You almost had it right! Try this:

> sapply(regression, function(x) mean(summary(x)$adj.r.squared))
     1940      1941      1942      1943      1944      1945      1946      1947      1948      1949 
0.7230061 0.7293396 0.7399216 0.7770505 0.7998859 0.8413422 0.8571037 0.8561229 0.8348950 0.8357761 
     1950 
0.8324654 

You may as well use lapply() in the above. The same can be done for any coefficient, t-test, once you identify how to extract them from the coefficient table.

To extract coefficients for value you can proceed as follows:

lapply(regression, function(x) summary(x)$coefficients[ rownames(summary(x)$coefficients)=="value", ])

A more compact version would be:

sapply(regression, function(x) summary(x)$coefficients[ rownames(summary(x)$coefficients)=="value", ])

From the above you can obtain means as follows:

> (x <- t(sapply(regression, function(x) summary(x)$coefficients[ rownames(summary(x)$coefficients)=="value", ])))
       Estimate  Std. Error  t value     Pr(>|t|)
1940 0.08283141 0.006873563 12.05072 2.615793e-17
1941 0.08614094 0.007258571 11.86748 4.904857e-17
1942 0.09018745 0.007711639 11.69498 8.898811e-17
1943 0.09945565 0.007751087 12.83119 1.886416e-18
1944 0.10568804 0.007376523 14.32762 1.516617e-20
1945 0.11358598 0.006722166 16.89723 7.314450e-24
1946 0.12227763 0.006781509 18.03104 3.203995e-25
1947 0.12599497 0.007199027 17.50167 1.356383e-24
1948 0.12605599 0.008005481 15.74621 2.030259e-22
1949 0.12951740 0.008452725 15.32256 7.175275e-22
1950 0.13647072 0.009530406 14.31951 1.555615e-20
> colMeans(x)
    Estimate   Std. Error      t value     Pr(>|t|) 
1.107460e-01 7.605700e-03 1.459911e+01 1.510115e-17 

This said, this looks suspiciously like the Fama-MacBeth estimates: Fama MacBeth standard errors in R. These can be easily obtained using pmg() in plm.

Upvotes: 4

Related Questions