user2146441
user2146441

Reputation: 228

SAS running total

I have some sample data as follows, and want to calculate the number of winning or losing bets in a row.

data have;
   input username $  betdate : datetime. stake winnings;
   dateOnly = datepart(betdate) ;
   format betdate DATETIME.;
   format dateOnly ddmmyy8.;
   datalines; 
    player1 12NOV2008:12:04:01 90 -90 
    player1 04NOV2008:09:03:44 100 40 
    player2 07NOV2008:14:03:33 120 -120 
    player1 05NOV2008:09:00:00 50 15 
    player1 05NOV2008:09:05:00 30 5 
    player1 05NOV2008:09:00:05 20 10 
    player2 09NOV2008:10:05:10 10 -10 
    player2 15NOV2008:15:05:33 35 -35 
    player1 15NOV2008:15:05:33 35 15 
    player1 15NOV2008:15:05:33 35 15 
run;
PROC PRINT; RUN;
proc sort data=have;
   by username betdate;
run;
DM "log; clear;";
data want;
   set have;
    by username dateOnly betdate;   
   retain calendarTime eventTime cumulativeDailyProfit profitableFlag;
   if first.username then calendarTime = 0;
   if first.dateOnly then calendarTime + 1;
   if first.username then eventTime = 0;
   if first.betdate then eventTime + 1;
   if first.username then cumulativeDailyProfit = 0;
   if first.dateOnly then cumulativeDailyProfit = 0;
   if first.betdate then cumulativeDailyProfit + stake;
   if winnings > 0 then winner = 1;
  if winnings <= 0 then winner = 0;
 PROC PRINT; RUN;

For example, the first four bets four player1 are winners, so the first four rows in this column should show 1,2,3,4 (at this point, four wins in a row). The fifth is a loser, so should show -1, followed by 1,2. The following three rows (for player 3, should show -1, -2, -3 as the customer has had three bets in a row. How can I calculate the value of this column in the data step? How can I also have a column for the largest number of winning bets in a row (to date) and the maximum number of losing bets the customer has had to date in each row?

Thanks for any help.

Upvotes: 1

Views: 935

Answers (1)

Joe
Joe

Reputation: 63424

To do a running total like this, you can use BY with NOTSORTED and still leverage the first.<var> functionality. For example:

data have;
input winlose $;
datalines;
win
win
win
win
lose
lose
win
lose
win
win
lose
;;;;
run;

data want;
set have;
by winlose notsorted;
if first.winlose and winlose='win' then counter=1;
else if first.winlose then counter=-1;
else if winlose='win' then counter+1;
else counter+(-1);
run;

Each time 'win' changes to 'lose' or the reverse, it resets the first.winlose variable to 1.

Once you have done this, you can either use a double DoW loop to append maximums, or perhaps more easily just get this value in a dataset and then add it on via a second datastep (or proc sql) to append your desired variables.

Upvotes: 3

Related Questions