Reputation: 47
a quick question, I have data of the following sort:
Ticker _ Date _ Fem Analyst (dummy 1 if true) ___ Variables of that month like beta
AA _ 01/04/2001 _ 1 ___ 0.61
AA _ 05/04/2001 _ 1 ___ 0.62
AA _ 08/04/2001 _ 1 ___ 0.63
AA _ 01/05/2002 _ 1 ___ 0.7
AA _ 04/05/2002 _ 1 ___ 0.71
AA _ 08/07/2002 _ 0 ___ 0.8
AA _ 07/04/2003 _ 1 ___ 0.4
and so on.. What I want to receive is the following:
Ticker _ Date Number of fem analyst Number of Male Analysts _ Total ___Variables
AA _ 04/2001 3 0 _ 3 ___ 0.63
AA _ 05/2002 2 0 _ 2 ___ 0.71
AA _ 07/2002 0 1 _ 1 ___ 0.8
AA _ 04/2003 1 0 _ 1 ___ 0.4
So a counting algorithm that allows me to count the number of female and male analyst for a certain company per month( using dummy variable gender 0 or 1) and deletes all observations for that month except the most recent one (for instance for 08/04/01 this becomes 04/01 with 0.63 which is the most recent observation for beta for 04/01 for company AA) The example explains it all I guess?
Any ideas?
Upvotes: 0
Views: 2952
Reputation: 1276
You may want something like this:
/* Create the month variable into a string YYYY/MM */
data analysts0;
set <your data>;
format month $7.;
month=cats(year(date),'/',put(month(date),z2.));
run;
/* Sort so you can do the by processing required for counting */
proc sort data=analyst0 out=analyst1;
/* You need to include the date in the sort so the most recent is last */
by ticker month date;
run;
/* Count */
data count;
retain n_fem n_male 0;
set analyst1;
by ticker month;
if first.ticker of first.month then do;
n_fem=0;
n_male=0;
end;
else do;
if gender=1 then n_fem+1;
else if gender=0 then n_male+1;
else put 'Huh?';
end;
/* this outputs only the values you need.*/
if last.ticker or last.month then output;
run;
This should give you the general idea - I don't have access to SAS right now so I can't check the code. See the documentation for retain and by processing in the data step for more details.
Upvotes: 1