Reputation: 5937
I saw an example where there was a list (table) of employees with their respective monthly salaries. I did a sum of the salaries and saw the exact same table in the ouptput. That was strange.
Here is what has to be done - we have to find out how much money we pay this month as employee salaries. For that, we need to sum their salary amounts in the database as shown:
SELECT EmployeeID, SUM (MonthlySalary)
FROM Employee
GROUP BY EmpID
I know that I get an error if I don't use GROUP BY
in the above code. This is what I don't understand.
We are selecting EmployeeID from the Employee table. SUM()
is being told that it has to add the MonthlySalary column, from the Employee table. So, it should directly go and add those numbers up instead of grouping them and then adding them.
Thats how a person would do it - look at the employee table and add all the numbers. Why would they take the trouble to group them and then add them up?
Upvotes: 31
Views: 42613
Reputation: 837946
If you wanted to add up all the numbers you would not have a GROUP BY:
SELECT SUM(MonthlySalary) AS TotalSalary
FROM Employee
+-----------+
|TotalSalary|
+-----------+
|777400 |
+-----------+
The point of the GROUP BY is that you get a separate total for each employee.
+--------+------+
|Employee|Salary|
+--------+------+
|John |123400|
+--------+------+
|Frank |413000|
+--------+------+
|Bill |241000|
+--------+------+
Upvotes: 10
Reputation: 85036
It might be easier if you think of GROUP BY as "for each" for the sake of explanation. The query below:
SELECT empid, SUM (MonthlySalary)
FROM Employee
GROUP BY EmpID
is saying:
"Give me the sum of MonthlySalary's for each empid"
So if your table looked like this:
+-----+------------+
|empid|MontlySalary|
+-----+------------+
|1 |200 |
+-----+------------+
|2 |300 |
+-----+------------+
result:
+-+---+
|1|200|
+-+---+
|2|300|
+-+---+
Sum wouldn't appear to do anything because the sum of one number is that number. On the other hand if it looked like this:
+-----+------------+
|empid|MontlySalary|
+-----+------------+
|1 |200 |
+-----+------------+
|1 |300 |
+-----+------------+
|2 |300 |
+-----+------------+
result:
+-+---+
|1|500|
+-+---+
|2|300|
+-+---+
Then it would because there are two empid 1's to sum together. Not sure if this explanation helps or not, but I hope it makes things a little clearer.
Upvotes: 64
Reputation: 1269443
The sad thing is that there is one database that supports the syntax you are suggesting:
SELECT EmployeeID, SUM (MonthlySalary)
FROM Employee
However, MySQL does not do what you expect. It returns the overall sum of the MonthlySalary for everyone, and one arbitrary EmployeeId. Alas.
Your question is about SQL syntax. The answer is that is how SQL has been defined, and it is not going to change. Determining the aggregation fields from the SELECT
clause is not unreasonable, but it is not how this language is defined.
I do, however, have some sympathy for the question. Many people learning SQL think of "grouping" as something done in the context of sorting the rows. Something like "sort the cities in the US and group them by state in the output". Makes sense. But "group by" in SQL really means "summarize by" not "keep together".
Upvotes: 8
Reputation: 780688
If you don't specify GROUP BY
, aggregate functions operate over all the records selected. In that case, it doesn't make sense to also select a specific column like EmployeeID
. Either you want per-employee totals, in which case you select the employee ID and group by employee, or you want a total across the entire table, so you leave out the employee ID and the GROUP BY
clause.
In your query, if you leave out the GROUP BY
, which employee ID would you like it to show?
Upvotes: 4