Richard Herron
Richard Herron

Reputation: 10102

Speeding up rolling regressions in Stata

Should I avoid rolling and manually code rolling regressions? Or am I better off creating a giant panel with overlapping entries and using statsby? I.e., give each window its own by entry. In R I can pre-split the data into a list of date frames, which I think speeds up subsequent operations.

When I first switched from R to Stata a month ago I asked this on Statalist and the consensus was that it should take a long time. I coded and compiled OLS in Mata and noticed no speed improvement (actually, a slight worsening).

This seems rolling regressions are a common technique and Stata seems pretty sophisticated; are most researchers running these regressions for 1+ days? Or are they using SAS for these calculations? For example, I run the following following on the Compustat data base from 1975 to 2010 (about 30,000 regressions) and it takes about 12 hours.

rolling arbrisk = (e(rss) / e(N)), window(48) stepsize(12) ///
         saving(arbrisk, replace) nodots: regress r1 ewretd

Upvotes: 1

Views: 5260

Answers (3)

Robert Parham
Robert Parham

Reputation: 1

The asreg community-contributed command does it so quickly it is not even funny. I had the usual loop code run for 36 hours, and then the same thing ran with asreg in less than 5 minutes.

Apparently, most time per regression is wasted on choosing the subset of observations to run the regression on, and that is o(N), with N being the total number of observations in the dataset. It seems asreg implements that mess in Mata.

This will implement a standard CAPM rolling regression:

bysort permno: asreg mret_rf mkt_rf, wind(month 60)

with permno being the firm identifier, mret_rf the monthly firm return minus risk free rate, mkt_rf the monthly market return minus risk free rate, month the name of the date variable identifying the month, and 60 the size of the rolling window in months.

To install asreg in Stata:

ssc install asreg

Upvotes: 0

Richard Herron
Richard Herron

Reputation: 10102

It is indeed much faster to "manually" regress with summations than it is to use rolling with regress. The code below runs about 400 times faster than rolling with regress. Of course, rolling is more extensible, but if you only want beta, alpha, R^2, and sigma^2, then this will do the trick.

program rolling_beta
    version 11.2
    syntax varlist(numeric), window(real)

    * get dependent and indpendent vars from varlist
    tempvar x y x2 y2 xy xs ys xys x2s y2s covxy varx vary
    tokenize "`varlist'"
    generate `y' = `1' 
    generate `x' = `2' 
    local w = `window' 

    * generate products
    generate `xy' = `x'*`y'
    generate `x2' = `x'*`x'
    generate `y2' = `y'*`y'

    * generate cumulative sums
    generate `xs' = sum(`x')
    generate `ys' = sum(`y')
    generate `xys' = sum(`xy')
    generate `x2s' = sum(`x2')
    generate `y2s' = sum(`y2')

    * generate variances and covariances
    generate `covxy' = (s`w'.`xys' - s`w'.`xs'*s`w'.`ys'/`w')/`w'
    generate `varx' = (s`w'.`x2s' - s`w'.`xs'*s`w'.`xs'/`w')/`w'
    generate `vary' = (s`w'.`y2s' - s`w'.`ys'*s`w'.`ys'/`w')/`w'

    * generate alpha, beta, r2, s2
    generate beta = `covxy'/`varx'
    generate alpha = (s`w'.`ys' - beta*s`w'.`xs')/`w'
    generate r2 = `covxy'*`covxy'/`varx'/`vary'
    generate s2 = `vary'*`w'*(1 - r2)/(`w' - 2)

end

Upvotes: 2

user872324
user872324

Reputation:

I think the people from Statalist are right when they say that this should take a long time. You are running 30000 regressions on an important number of observations.

If you want to know where Stata is spending its time, you can use the profiler command.

profiler clear
profiler on
rolling arbrisk = (e(rss) / e(N)), window(48) stepsize(12) ///
     saving(arbrisk, replace) nodots: regress r1 ewretd
profiler off
profiler report

I wonder if creating a giant panel will help. You are likely to run into memory problems. You should check beforehands how big your panel will be and how much memory it will take:

http://www.stata.com/support/faqs/data/howbig.html

I am not surprised that using a self-coded OLS routine does no improve performance. The regress command is a so-called built-in command and is already pretty efficient. It will be hard to do better.

As far as SAS is concerned, run a couple of regressions in SAS and check how much time it takes. Do the same in Stata. My experience has been that Stata's regress is a bit faster than proc reg in SAS.

Upvotes: 3

Related Questions