Lijin Durairaj
Lijin Durairaj

Reputation: 5232

Use of HAVING without GROUP BY not working as expected

I am starting to learn SQL Server, in the documentation found in msdn states like this

HAVING is typically used with a GROUP BY clause. When GROUP BY is not used, there is an implicit single, aggregated group.

This made me to think that we can use having without a groupBy clause, but when I am trying to make a query I am not able to use it.

I have a table like this

CREATE TABLE [dbo].[_abc]
(
    [wage] [int] NULL
) ON [PRIMARY]
GO

INSERT INTO [dbo].[_abc] (wage)
VALUES (4), (8), (15), (30), (50) 
GO

Now when I run this query, I get an error

select * 
from [dbo].[_abc]
having sum(wage) > 5

Error:

enter image description here

Upvotes: 2

Views: 6640

Answers (3)

Salman Arshad
Salman Arshad

Reputation: 272106

HAVING without GROUP BY clause is perfectly valid but here is what you need to understand:

  • The result will contain zero or one row
    • The implicit GROUP BY will return exactly one row even if the WHERE condition matched zero rows
    • HAVING will keep or eliminate that single row based on the condition
  • Any column in the SELECT clause needs to be wrapped inside an aggregate function
  • You can also specify an expression as long as it is not functionally dependent on the columns

Which means you can do this:

SELECT SUM(wage)
FROM employees
HAVING SUM(wage) > 100
-- One row containing the sum if the sum is greater than 5
-- Zero rows otherwise

Or even this:

SELECT 1
FROM employees
HAVING SUM(wage) > 100
-- One row containing "1" if the sum is greater than 5
-- Zero rows otherwise

This construct is often used when you're interested in checking if a match for the aggregate was found:

SELECT *
FROM departments
WHERE EXISTS (
    SELECT 1
    FROM employees
    WHERE employees.department = departments.department
    HAVING SUM(wage) > 100
)
-- all departments whose employees earn more than 100 in total

Upvotes: 2

JohnLBevan
JohnLBevan

Reputation: 24410

The documentation is correct; i.e. you could run this statement:

select sum(wage) sum_of_all_wages
, count(1) count_of_all_records
from [dbo].[_abc] 
having sum(wage) > 5

The reason your statement doesn't work is because of the select *, which means select every columns' value. When there is no group by, all records are aggregated; i.e. you only get 1 record in your result set which has to represent every record. As such, you can only* include values provided by applying aggregate functions to your columns; not the columns themselves. * of course, you can also provide constants, so select 'x' constant, count(1) cnt from myTable would work.

There aren't many use cases I can think of where you'd want to use having without a group by, but certainly it can be done as shown above.

NB: If you wanted all rows where the wage was greater than 5, you'd use the where clause instead:

select * 
from [dbo].[_abc] 
where wage > 5

Equally, if you want the sum of all wages greater than 5 you can do this

select sum(wage) sum_of_wage_over_5 
from [dbo].[_abc] 
where wage > 5

Or if you wanted to compare the sum of wages over 5 with those under:

select case when wage > 5 then 1 else 0 end wage_over_five
, sum(wage) sum_of_wage
from [dbo].[_abc] 
group by case when wage > 5 then 1 else 0 end 

See runnable examples here.


Update based on comments:

Do you need having to use aggregate functions?

No. You can run select sum(wage) from [dbo].[_abc]. When an aggregate function is used without a group by clause, it's as if you're grouping by a constant; i.e. select sum(wage) from [dbo].[_abc] group by 1.

The documentation merely means that whilst normally you'd have a having statement with a group by statement, it's OK to exclude the group by / in such cases the having statement, like the select statement, will treat your query as if you'd specified group by 1

What's the point?

It's hard to think of many good use cases, since you're only getting one row back and the having statement is a filter on that.

One use case could be that you write code to monitor your licenses for some software; if you have less users than per-user-licenses all's good / you don't want to see the result since you don't care. If you have more users you want to know about it. E.g.

declare @totalUserLicenses int = 100
select count(1) NumberOfActiveUsers
, @totalUserLicenses NumberOfLicenses
, count(1) - @totalUserLicenses NumberOfAdditionalLicensesToPurchase
from [dbo].[Users]
where enabled = 1
having count(1) > @totalUserLicenses 

Isn't the select irrelevant to the having clause?

Yes and no. Having is a filter on your aggregated data. Select says what columns/information to bring back. As such you have to ask "what would the result look like?" i.e. Given we've had to effectively apply group by 1 to make use of the having statement, how should SQL interpret select *? Since your table only has one column this would translate to select wage; but we have 5 rows, so 5 different values of wage, and only 1 row in the result to show this.

I guess you could say "I want to return all rows if their sum is greater than 5; otherwise I don't want to return any rows". Were that your requirement it could be achieved a variety of ways; one of which would be:

select *
from [dbo].[_abc] 
where exists 
(
    select 1 
    from [dbo].[_abc] 
    having sum(wage) > 5
) 

However, we have to write the code to meet the requirement, rather than expect the code to understand our intent.

Another way to think about having is as being a where statement applied to a subquery. I.e. your original statement effectively reads:

select wage
from
(
    select sum(wage) sum_of_wage
    from [dbo].[_abc]
    group by 1
) singleRowResult
where sum_of_wage > 5

That won't run because wage is not available to the outer query; only sum_of_wage is returned.

Upvotes: 5

PrathapG
PrathapG

Reputation: 789

In SQL you cannot return aggregate functioned columns directly. You need to group the non aggregate fields

As shown below example

 USE AdventureWorks2012 ;  
GO  
SELECT SalesOrderID, SUM(LineTotal) AS SubTotal  
FROM Sales.SalesOrderDetail  
GROUP BY SalesOrderID  
HAVING SUM(LineTotal) > 100000.00  
ORDER BY SalesOrderID ;  

In your case you don't have identity column for your table it should come as below

Alter _abc
Add Id_new Int Identity(1, 1)
Go

Upvotes: 0

Related Questions