Reputation: 5232

Use of HAVING without GROUP BY not working as expected

I am starting to learn SQL Server, in the documentation found in msdn states like this

HAVING is typically used with a GROUP BY clause. When GROUP BY is not used, there is an implicit single, aggregated group.

This made me to think that we can use having without a groupBy clause, but when I am trying to make a query I am not able to use it.

I have a table like this

CREATE TABLE [dbo].[_abc]
(
    [wage] [int] NULL
) ON [PRIMARY]
GO

INSERT INTO [dbo].[_abc] (wage)
VALUES (4), (8), (15), (30), (50) 
GO

Now when I run this query, I get an error

select * 
from [dbo].[_abc]
having sum(wage) > 5

Error:

Upvotes: 2

Answers (3)

Salman Arshad

Reputation: 272106

HAVING without GROUP BY clause is perfectly valid but here is what you need to understand:

The result will contain zero or one row
- The implicit GROUP BY will return exactly one row even if the WHERE condition matched zero rows
- HAVING will keep or eliminate that single row based on the condition
Any column in the SELECT clause needs to be wrapped inside an aggregate function
You can also specify an expression as long as it is not functionally dependent on the columns

Which means you can do this:

SELECT SUM(wage)
FROM employees
HAVING SUM(wage) > 100
-- One row containing the sum if the sum is greater than 5
-- Zero rows otherwise

Or even this:

SELECT 1
FROM employees
HAVING SUM(wage) > 100
-- One row containing "1" if the sum is greater than 5
-- Zero rows otherwise

This construct is often used when you're interested in checking if a match for the aggregate was found:

SELECT *
FROM departments
WHERE EXISTS (
    SELECT 1
    FROM employees
    WHERE employees.department = departments.department
    HAVING SUM(wage) > 100
)
-- all departments whose employees earn more than 100 in total

Upvotes: 2

JohnLBevan

Reputation: 24410

The documentation is correct; i.e. you could run this statement:

select sum(wage) sum_of_all_wages
, count(1) count_of_all_records
from [dbo].[_abc] 
having sum(wage) > 5

The reason your statement doesn't work is because of the select *, which means select every columns' value. When there is no group by, all records are aggregated; i.e. you only get 1 record in your result set which has to represent every record. As such, you can only* include values provided by applying aggregate functions to your columns; not the columns themselves. * of course, you can also provide constants, so select 'x' constant, count(1) cnt from myTable would work.

There aren't many use cases I can think of where you'd want to use having without a group by, but certainly it can be done as shown above.

NB: If you wanted all rows where the wage was greater than 5, you'd use the where clause instead:

select * 
from [dbo].[_abc] 
where wage > 5

Equally, if you want the sum of all wages greater than 5 you can do this

select sum(wage) sum_of_wage_over_5 
from [dbo].[_abc] 
where wage > 5

Or if you wanted to compare the sum of wages over 5 with those under:

select case when wage > 5 then 1 else 0 end wage_over_five
, sum(wage) sum_of_wage
from [dbo].[_abc] 
group by case when wage > 5 then 1 else 0 end

See runnable examples here.

Update based on comments:

Do you need `having` to use aggregate functions?

No. You can run select sum(wage) from [dbo].[_abc]. When an aggregate function is used without a group by clause, it's as if you're grouping by a constant; i.e. select sum(wage) from [dbo].[_abc] group by 1.

The documentation merely means that whilst normally you'd have a having statement with a group by statement, it's OK to exclude the group by / in such cases the having statement, like the select statement, will treat your query as if you'd specified group by 1

What's the point?

It's hard to think of many good use cases, since you're only getting one row back and the having statement is a filter on that.

One use case could be that you write code to monitor your licenses for some software; if you have less users than per-user-licenses all's good / you don't want to see the result since you don't care. If you have more users you want to know about it. E.g.

declare @totalUserLicenses int = 100
select count(1) NumberOfActiveUsers
, @totalUserLicenses NumberOfLicenses
, count(1) - @totalUserLicenses NumberOfAdditionalLicensesToPurchase
from [dbo].[Users]
where enabled = 1
having count(1) > @totalUserLicenses

Isn't the select irrelevant to the having clause?

Yes and no. Having is a filter on your aggregated data. Select says what columns/information to bring back. As such you have to ask "what would the result look like?" i.e. Given we've had to effectively apply group by 1 to make use of the having statement, how should SQL interpret select *? Since your table only has one column this would translate to select wage; but we have 5 rows, so 5 different values of wage, and only 1 row in the result to show this.

I guess you could say "I want to return all rows if their sum is greater than 5; otherwise I don't want to return any rows". Were that your requirement it could be achieved a variety of ways; one of which would be:

select *
from [dbo].[_abc] 
where exists 
(
    select 1 
    from [dbo].[_abc] 
    having sum(wage) > 5
)

However, we have to write the code to meet the requirement, rather than expect the code to understand our intent.

Another way to think about having is as being a where statement applied to a subquery. I.e. your original statement effectively reads:

select wage
from
(
    select sum(wage) sum_of_wage
    from [dbo].[_abc]
    group by 1
) singleRowResult
where sum_of_wage > 5

That won't run because wage is not available to the outer query; only sum_of_wage is returned.

Upvotes: 5

PrathapG

Reputation: 789

In SQL you cannot return aggregate functioned columns directly. You need to group the non aggregate fields

As shown below example

 USE AdventureWorks2012 ;  
GO  
SELECT SalesOrderID, SUM(LineTotal) AS SubTotal  
FROM Sales.SalesOrderDetail  
GROUP BY SalesOrderID  
HAVING SUM(LineTotal) > 100000.00  
ORDER BY SalesOrderID ;

In your case you don't have identity column for your table it should come as below

Alter _abc
Add Id_new Int Identity(1, 1)
Go

Upvotes: 0