user2146441
user2146441

Reputation: 228

Summary function in HAVING clause in SAS

I have the following query which returns a customer's username, the year and half-year, a count of bets for the first semi-year, a count of bets for the second semi-year and a profit calc.

proc sql;

  create table avg
  as
  select 
  username as username,
  year(datepart(betdate))*10 + floor( (qtr(datepart(betdate))+1)/2) as yearsemiyear, 
  count(bet_id) as betcount,

  sum( case when floor( (qtr(datepart(betdate))+1)/2) = 1 then 1 else 0 end ) as firstHalfBetcount,
  sum( case when floor( (qtr(datepart(betdate))+1)/2) = 2 then 1 else 0 end ) as secondHalfBetcount,

  round(mean((winnings - stake) / stake) * 100, 0.01) as averageMargin

  from &dsn
  WHERE datepart(BETDATE) > '31DEC2010'd AND datepart(BETDATE) <'01JAN2012'd 
  AND bet_type = 'SGL'   
  group by 1,2
 /* HAVING sum( case when floor( (qtr(datepart(betdate))+1)/2) = 1 then 1 else 0 end ) >= 4 */
  order by username desc, yearsemiyear asc;

quit;

How can I restrict this to customers who had at least 4 bets in the first half of the sample? If I uncomment the HAVING clause I have there, it only seems to return rows for the first half, and no second half rows are returned. I tried putting the restriction into the WHERE clause, but apparently you can't put a summary function in that CLAUSE.

How should I do this?

Thanks

Upvotes: 1

Views: 5975

Answers (2)

JaneGoGo
JaneGoGo

Reputation: 1

I have the issue, too. After detailed compare the right code and mine, I found that the problem is group by, which is only character variables. If we use numerical ones, then the error comes

Upvotes: 0

Joe
Joe

Reputation: 63424

You can reference the calculated columns in your HAVING clause. See for example:

proc sql;
select country, sum(case when region='EAST' then Actual else 0 end) as east_act,  
sum(case when region='WEST' then Actual else 0 end) as west_act
from sashelp.prdsale 
group by 1
having east_act > 120000;
quit;

Specific to your problem, you should change your HAVING clause to

HAVING firstHalfBetcount ge 4

I'm looking at this again, though, and I think your problem is that you're grouping by half year - so you wouldn't get records from the second half, as you'd never have a record that has both first half and second half data (if your data is as I think it is). You either need to not group by half year, and do this at the overall year level, or not separate first half/second half in columns.

If your goal is to get rows for both, but only include rows for people with 4 first half bets, you may need to create this table and then join it to a query of people with at least 4 first half bets, or use a where (select... ) subquery to filter down to those people; you can't do it with just grouping and having.

Upvotes: 2

Related Questions