Reputation: 228
I have some data (created using the code) below that ranks observations according to two variables. In this case, it ranks the players first bet and second bet and creates two 'rank' variables. What I want to do instead is rank the observations according a function of the two variables instead (like the average of the two variables) and I'd like to do this in the PROC RANK command itself rather than using a preliminary data step as the ranking will get fairly involved after I replicate this on all the variables I need. Can I put operators into the PROC RANK statement? Rather than doing this:
Proc rank data=want ties=mean out=ranked groups=2;
var bet1stake bet2stake;
ranks bet1stakeRank bet2stakeRank;
run;
I would like to do this:
Proc rank data=want ties=mean out=ranked groups=2;
var avg(bet1stake, bet2stake);
ranks firstTwoBetsRank;
run;
Is this possible?
This is how the full example data can be created.
data have;
input username $ betdate : datetime. stake winnings;
dateOnly = datepart(betdate) ;
format betdate DATETIME.;
format dateOnly ddmmyy8.;
datalines;
player1 12NOV2008:12:04:01 90 -90
player1 04NOV2008:09:03:44 100 40
player2 07NOV2008:14:03:33 120 -120
player1 05NOV2008:09:00:00 50 15
player1 05NOV2008:09:05:00 30 5
player1 05NOV2008:09:00:05 20 10
player2 09NOV2008:10:05:10 10 -10
player2 15NOV2008:15:05:33 35 -35
player1 15NOV2008:15:05:33 35 15
player1 15NOV2008:15:05:33 35 15
run;
proc sort data=have;
by username betdate;
run;
data have;
set have;
by username betdate;
retain eventTime;
if first.username then eventTime = 0;
if first.betdate then eventTime + 1;
run;
proc sql;
create table want as
select
distinct username,
(select distinct stake from have where username = main.username and eventTime = 1) as bet1Stake,
(select distinct stake from have where username = main.username and eventTime = 2) as bet2Stake
from have main;
quit;
Proc rank data=want ties=mean out=want groups=2;
var bet1stake bet2stake;
ranks bet1stakeRank bet2stakeRank;
run;
Thanks for any help on this.
Upvotes: 0
Views: 927
Reputation:
I'm afraid you cannot apply operators on the variables you'd like to rank your observations.
The choice you have is either to use a DATA
step to do both the application of operators and the calculation of the ranking
Or
use a Data step view
or SQL view
to apply the operator as an intermediate step just in case if you are concerned about disk space.
In case you are pulling the data from a SQL database (assuming it supports windowing functions) you should be to do exactly what you are trying to do just with some SQL code that is passed-through to the database.
Upvotes: 2