Reputation: 10102
Should I avoid rolling
and manually code rolling regressions? Or am I better off creating a giant panel with overlapping entries and using statsby
? I.e., give each window its own by
entry. In R I can pre-split the data into a list of date frames, which I think speeds up subsequent operations.
When I first switched from R to Stata a month ago I asked this on Statalist and the consensus was that it should take a long time. I coded and compiled OLS in Mata and noticed no speed improvement (actually, a slight worsening).
This seems rolling regressions are a common technique and Stata seems pretty sophisticated; are most researchers running these regressions for 1+ days? Or are they using SAS for these calculations? For example, I run the following following on the Compustat data base from 1975 to 2010 (about 30,000 regressions) and it takes about 12 hours.
rolling arbrisk = (e(rss) / e(N)), window(48) stepsize(12) ///
saving(arbrisk, replace) nodots: regress r1 ewretd
Upvotes: 1
Views: 5260
Reputation: 1
The asreg
community-contributed command does it so quickly it is not even funny. I had the usual loop code run for 36
hours, and then the same thing ran with asreg
in less than 5
minutes.
Apparently, most time per regression is wasted on choosing the subset of observations to run the regression on, and that is o(N)
, with N
being the total number of observations in the dataset. It seems asreg
implements that mess in Mata.
This will implement a standard CAPM rolling regression:
bysort permno: asreg mret_rf mkt_rf, wind(month 60)
with permno
being the firm identifier, mret_rf
the monthly firm return minus risk free rate, mkt_rf
the monthly market return minus risk free rate, month
the name of the date variable identifying the month, and 60
the size of the rolling window in months.
To install asreg
in Stata:
ssc install asreg
Upvotes: 0
Reputation: 10102
It is indeed much faster to "manually" regress with summations than it is to use rolling
with regress
. The code below runs about 400 times faster than rolling
with regress
. Of course, rolling
is more extensible, but if you only want beta, alpha, R^2, and sigma^2, then this will do the trick.
program rolling_beta
version 11.2
syntax varlist(numeric), window(real)
* get dependent and indpendent vars from varlist
tempvar x y x2 y2 xy xs ys xys x2s y2s covxy varx vary
tokenize "`varlist'"
generate `y' = `1'
generate `x' = `2'
local w = `window'
* generate products
generate `xy' = `x'*`y'
generate `x2' = `x'*`x'
generate `y2' = `y'*`y'
* generate cumulative sums
generate `xs' = sum(`x')
generate `ys' = sum(`y')
generate `xys' = sum(`xy')
generate `x2s' = sum(`x2')
generate `y2s' = sum(`y2')
* generate variances and covariances
generate `covxy' = (s`w'.`xys' - s`w'.`xs'*s`w'.`ys'/`w')/`w'
generate `varx' = (s`w'.`x2s' - s`w'.`xs'*s`w'.`xs'/`w')/`w'
generate `vary' = (s`w'.`y2s' - s`w'.`ys'*s`w'.`ys'/`w')/`w'
* generate alpha, beta, r2, s2
generate beta = `covxy'/`varx'
generate alpha = (s`w'.`ys' - beta*s`w'.`xs')/`w'
generate r2 = `covxy'*`covxy'/`varx'/`vary'
generate s2 = `vary'*`w'*(1 - r2)/(`w' - 2)
end
Upvotes: 2
Reputation:
I think the people from Statalist are right when they say that this should take a long time. You are running 30000 regressions on an important number of observations.
If you want to know where Stata is spending its time, you can use the profiler
command.
profiler clear
profiler on
rolling arbrisk = (e(rss) / e(N)), window(48) stepsize(12) ///
saving(arbrisk, replace) nodots: regress r1 ewretd
profiler off
profiler report
I wonder if creating a giant panel will help. You are likely to run into memory problems. You should check beforehands how big your panel will be and how much memory it will take:
http://www.stata.com/support/faqs/data/howbig.html
I am not surprised that using a self-coded OLS routine does no improve performance. The regress
command is a so-called built-in command and is already pretty efficient. It will be hard to do better.
As far as SAS is concerned, run a couple of regressions in SAS and check how much time it takes. Do the same in Stata. My experience has been that Stata's regress
is a bit faster than proc reg
in SAS.
Upvotes: 3