Reputation: 65
I have a data set here.
An excerpt of my data looks like this: (For an enlarged version: http://puu.sh/79NCK.jpg)
(Note: there are no missing values in my dataset)
I wish to calculate the correlation matrix using a rolling window of 1 year. My period starts from 01 Jan 2008. So for example, the correlation between AUT
and BEL
on 01 Jan 2008 is calculated using the series of values from 01 Jan 2007 to 01 Jan 2008, and likewise for all other pairs. Similarly the correlation between AUT
and BEL
on 02 Jan 2008 is calculated using the series of values from 02 Jan 2007 to 02 Jan 2008.
Since there will be a different correlation matrix for each day, I want to output each day's correlation matrix into a sheet in excel and name that sheet COV1 (for 01 Jan 2008), COV2 (for 02 Jan 2008), COV3 (for 03 Jan 2008), and so on until COV1566 (for 31 Dec 2013). An excerpt of the output for each sheet is like this: (Note: with the titles included on the top row and first column)
I have loaded my datafile into SAS named rolling
. For the moment, my code is simply:
proc corr data = mm.rolling;
run;
Which simply calculates the correlation matrix using the entire series of values. I am very new to SAS, any help would be appreciated.
Upvotes: 0
Views: 1282
Reputation:
Think about how you might do if you had immense amount of patience.
proc corr data = mm.rolling out = correlation_as_of_01jan2008;
where date between '01jan2007'd and '01jan2008'd;
run;
Similarly,
proc corr data = mm.rolling out = correlation_as_of_02jan2008;
where date between '02jan2007'd and '02jan2008'd;
run;
Thankfully you can use SAS macro programming to achieve a similar effect as shown in this macro:
%macro rollingCorrelations(inputDataset=, refDate=);
/*first get a list of unique dates on or after the reference date*/
proc freq data = &inputDataset. noprint;
where date >="&refDate."d;
table date/out = dates(keep = date);
run;
/*for each date calculate what the window range is, here using a year's length*/
data dateRanges(drop = date);
set dates end = endOfFile
nobs= numDates;
format toDate fromDate date9.;
toDate=date;
fromDate = intnx('year', toDate, -1, 's');
call symputx(compress("toDate"!!_n_), put(toDate,date9.));
call symputx(compress("fromDate"!!_n_), put(fromDate, date9.) );
/*find how many times(numberOfWindows) we need to iterate through*/
if endOfFile then do;
call symputx("numberOfWindows", numDates);
end;
run;
%do i = 1 %to &numberOfWindows.;
/*create a temporary view which has the filtered data that is passed to PROC CORR*/
data windowedDataview / view = windowedDataview;
set &inputDataset.;
where date between "&&fromDate&i."d and "&&toDate&i."d;
drop date;
run;
/*the output dataset from each PROC CORR run will be
correlation_DDMMMYYY<from date>_DDMMMYY<start date>*/
proc corr data = windowedDataview
outp = correlations_&&fromDate&i.._&&toDate&i. (where=(_type_ = 'CORR'))
noprint;
run;
%end;
/*append all datasets into a single table*/
data all_correlations;
format from to date9.;
set correlations_:
indsname = datasetname
;
from = input(substr(datasetname,19,9),date9.);
to = input(substr(datasetname,29,9), date9.);
run;
%mend rollingCorrelations;
%rollingCorrelations(inputDataset=rolling, refDate=01JAN2008)
The final output from the above macro will have from
& to
identifier to identify which date range each correlation matrix refers. Run it and examine the results.
I dont think excel can accomodate over 1500 tabs anyway, so best to keep it in a single table. The final table had 81K rows and the whole process ran in 2.5 mins.
update: to sort them by from
& to
proc sort data = ALL_CORRELATIONS;
by from to;
run;
Upvotes: 2