CasperVanBrink
CasperVanBrink

Reputation: 15

Summing up variable by preset N observations

I am running into problems doing a fairly basic summation. My dataset is composed of company ID's (cusip8) and their daily (date) abnormal returns (AR). I need to sum the abnormal returns of each company, from days t+3 until t+60 forward.

cusip8  year date                    ret                  vwretd            AR 
"00030710" 2014 19998                     .            .0007672172             . .
"00030710" 2014 19999   .008108087815344334             .009108214  -.0010001262 .
"00030710" 2014 20002    .03163539618253708             -.00158689    .033222288 .
"00030710" 2014 20003                     0   -.014999760000000001     .01499976 .
"00030710" 2014 20004  -.005717287305742502               .0158898    -.02160709 .
"00030710" 2014 20005   .006272913888096809             -.02121511    .027488023 .
"00030710" 2014 20006  -.012987012974917889             -.01333873    .000351717 .

I have tried the following:

sort cusip8 date
    by cusip8: gen CAR = AR if _n==1
(24,741,003 missing values generated)
    by cusip8: replace CAR = AR +CAR[_n-1] if _n>3 & if _n<60

And have yet been left with just .'s in the newly generated variable. Does anyone know how to solve this?

I am using Stata 16.0.

Upvotes: 0

Views: 279

Answers (1)

Nick Cox
Nick Cox

Reputation: 37358

You have more problems than one. First, let's address your problem report.

In each panel, CAR[2] is created missing by your code, which creates CAR only in the first observation. That messes up all subsequent calculations, as for example CAR[3] is AR[3] + CAR[2], and so missing, CAR[4] is AR[4] + CAR[3] and so missing, and so on.

Contrary to your claim, in each panel CAR[1] should be non-missing whenever AR is.

Second, evidently you have gaps for days 20000 and 20001 which were at a weekend. dow() returns 6 for Saturday and 0 for Sunday from daily dates (for which 0 is 1 January 1960).

. di dow(20000)
6

. di dow(20001)
0

. di %td 20000
04oct2014

So, either set up a business calendar to exclude weekends and holidays, or decide that you want just to use whatever is available within particular windows based on daily dates.

Third, your wording is not precisely enough to make your problem unambiguous to anyone who doesn't routinely deal with your kind of data. It seems that you seek a cumulative (running) sum but the window could just be one window (as your question literally implies) or a moving window (which I guess at). The function sum() gives cumulative or running sums: see help sum(). Just possibly,

bysort cusip8 (date): gen wanted = sum(AR) 

is a start on your solution. Otherwise, ssc describe rangestat shows you a command good for moving window calculations.

There are hundreds of posts in this territory on Statalist.

Upvotes: 1

Related Questions