Reputation: 228
I have some sample data as follows, and want to calculate the number of winning or losing bets in a row.
data have;
input username $ betdate : datetime. stake winnings;
dateOnly = datepart(betdate) ;
format betdate DATETIME.;
format dateOnly ddmmyy8.;
datalines;
player1 12NOV2008:12:04:01 90 -90
player1 04NOV2008:09:03:44 100 40
player2 07NOV2008:14:03:33 120 -120
player1 05NOV2008:09:00:00 50 15
player1 05NOV2008:09:05:00 30 5
player1 05NOV2008:09:00:05 20 10
player2 09NOV2008:10:05:10 10 -10
player2 15NOV2008:15:05:33 35 -35
player1 15NOV2008:15:05:33 35 15
player1 15NOV2008:15:05:33 35 15
run;
PROC PRINT; RUN;
proc sort data=have;
by username betdate;
run;
DM "log; clear;";
data want;
set have;
by username dateOnly betdate;
retain calendarTime eventTime cumulativeDailyProfit profitableFlag;
if first.username then calendarTime = 0;
if first.dateOnly then calendarTime + 1;
if first.username then eventTime = 0;
if first.betdate then eventTime + 1;
if first.username then cumulativeDailyProfit = 0;
if first.dateOnly then cumulativeDailyProfit = 0;
if first.betdate then cumulativeDailyProfit + stake;
if winnings > 0 then winner = 1;
if winnings <= 0 then winner = 0;
PROC PRINT; RUN;
For example, the first four bets four player1 are winners, so the first four rows in this column should show 1,2,3,4 (at this point, four wins in a row). The fifth is a loser, so should show -1, followed by 1,2. The following three rows (for player 3, should show -1, -2, -3 as the customer has had three bets in a row. How can I calculate the value of this column in the data step? How can I also have a column for the largest number of winning bets in a row (to date) and the maximum number of losing bets the customer has had to date in each row?
Thanks for any help.
Upvotes: 1
Views: 935
Reputation: 63424
To do a running total like this, you can use BY
with NOTSORTED
and still leverage the first.<var>
functionality. For example:
data have;
input winlose $;
datalines;
win
win
win
win
lose
lose
win
lose
win
win
lose
;;;;
run;
data want;
set have;
by winlose notsorted;
if first.winlose and winlose='win' then counter=1;
else if first.winlose then counter=-1;
else if winlose='win' then counter+1;
else counter+(-1);
run;
Each time 'win' changes to 'lose' or the reverse, it resets the first.winlose
variable to 1.
Once you have done this, you can either use a double DoW loop to append maximums, or perhaps more easily just get this value in a dataset and then add it on via a second datastep (or proc sql) to append your desired variables.
Upvotes: 3