Reputation: 11
What's the difference between DISTINCT and GROUP BY. I understand that DISTINCT removes replicated values and GROUP BY is often used with aggregate functions, but why is it that under the following code
SELECT DISTINCT continent FROM world
HAVING SUM(population) > 100000000;
only displays 1 continent and this:
SELECT continent FROM world
GROUP BY continent HAVING SUM(population) > 100000000;
Shows all each continent? I was doing question https://sqlzoo.net/wiki/SUM_and_COUNT. I'm trying to display each continent that has SUM(population) thats bigger than 100000000. Thanks guys.
Upvotes: 1
Views: 177
Reputation: 29647
In MySql, one can't really trust the results of an aggregate function like SUM when you use DISTINCT.
Because with DISTINCT it's not actually grouping on the other columns.
Note that in other databases than MySql & relatives, that one wouldn't face such confusion. Because those would complain about fields missing in the GROUP BY.
To explain it more visually.
With this sample data (MySql):
drop table if exists tmp_table;
create temporary table tmp_table (col1 int, col2 char(1), col3 int);
insert into tmp_table (col1, col2, col3) values
(1,'A',1),(2,'A',2),(3,'A',3),
(4,'B',4),(5,'B',5),(6,'B',6);
Via DISTINCT and SUM :
select distinct col2, sum(col3) from tmp_table;
Returns :
col2 sum(col3)
---- ---------
A 21
Via GROUP BY :
select col2, sum(col3) from tmp_table group by col2;
Returns :
col2 sum(col3)
---- ---------
A 6
B 15
Then when we include the HAVING clause to those queries
... having sum(col3) > 10;
Then the first select will return the wrong result: ('A',21)
And the second select will return what you'd expect : ('B',15)
Upvotes: 1