Krupal Shah
Krupal Shah

Reputation: 9187

wrapping inside aggregate function in SQL query

I have 2 tables called Orders and Salesperson shown below:

databse tables

And I want to retrieve the names of all salespeople that have more than 1 order from the tables above.

Then firing following query shows an error:

SELECT Name
FROM Orders, Salesperson
WHERE Orders.salesperson_id = Salesperson.ID
GROUP BY salesperson_id
HAVING COUNT( salesperson_id ) >1

The error is:

Column 'Name' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.

From the error and searching it on google, I could understand that the error is because of Name column must be either a part of the group by statement or aggregate function.

Also I tried to understand why does the selected column have to be in the group by clause or art of an aggregate function? But didn't understand clearly.

So, how to fix this error?

Upvotes: 0

Views: 2782

Answers (3)

Matt Caton
Matt Caton

Reputation: 3503

For readability and correctness, I usually split aggregate queries into two parts:

  1. The aggregate query
  2. Any additional queries to support fields not contained in aggregate functions

So:

1.Aggregate query - salespeople with more than 1 order

SELECT salesperson_id FROM ORDERS GROUP BY salespersonId HAVING COUNT(Number) > 1

2.Use aggregate as subquery (basically a select joining onto another select) to join on any additional fields:

SELECT * FROM Salesperson SP INNER JOIN ( SELECT salesperson_id FROM ORDERS GROUP BY salespersonId HAVING COUNT(Number) > 1 ) AGG_QUERY ON AGG_QUERY.salesperson_id = SP.ID

There are other approaches, such as selecting the additional fields via aggregation functions (as shown by the other answers). These get the code written quickly so if you are writing the query under time pressure you may prefer that approach. If the query needs to be maintained (and hence readable) I would favour subqueries.

Upvotes: 0

Jester
Jester

Reputation: 3317

SELECT max(Name) as Name
FROM Orders, Salesperson
WHERE Orders.salesperson_id = Salesperson.ID
GROUP BY salesperson_id
HAVING COUNT( salesperson_id ) >1

The basic idea is that columns that are not in the group by clause need to be in an aggregate function now here due to the fact that the name is probably the same for every salesperson_id min or max make no real difference (the result is the same)

example

Looking at your data you have 3 entry's for Dan(7) now when a join is created the with row Dan (Name) gets multiplied by 3 (For every number 1 Dan) and then the server does not now witch "Dan" to pick cos to the server that are 3 lines even doh they are semantically the same

also try this so that you see what I am talking about:

SELECT Orders.Number,  Salesperson.Name
FROM Orders, Salesperson
WHERE Orders.salesperson_id = Salesperson.ID

As far as the query goes INNER JOIN is a better solution since its kinda the standard for this simple query it should not matter but in some cases can happen that INNER JOIN produces better results but as far as I know this is more of a legacy thing since this days the server should pretty much produce the same execution plan.

For code clarity I would stick with INNER JOIN

Upvotes: 1

Cheruvian
Cheruvian

Reputation: 5867

Assuming the name is unique to the salesperson.id then simply add it to your group by clause

 GROUP BY salesperson_id, salesperson.Name

Otherwise use any Agg function

 Select Min(Name)

The reason for this is that SQL doesn't know whether there are multiple name per salesperson.id

Upvotes: 1

Related Questions