Reputation: 11171
If I have a table with duplicate ids I will get the same result if I use GROUP BY id
as if I use SELECT DISTINCT(id)
, right?
So when should I prefer one option over the other?
Upvotes: 3
Views: 2143
Reputation: 5803
An example of when you might prefer group by
over a distinct
. Consider a scenario where a window function
(not necessarily row_number()) needs to be applied on the distinct result set. Respecting the order of operation, you would have to go with something like this using distinct
select id, row_number() over (order by id) as rn
from (select distinct id from my_table) t;
The same could be achieved without using a subquery using group by
select id, row_number() over (order by id) as rn
from my_table
group by id;
This was possible because window functions
are applied after group by
but before distinct
Upvotes: 0
Reputation: 671
Just extra information:
It's better to use GROUP BY instead of DISTINCT if you are querying an indexed field and have a LIMIT, because it would use the index, and not a temporary table
See these links:
http://dev.mysql.com/doc/refman/5.1/en/internal-temporary-tables.html
"If there is an ORDER BY clause and a different GROUP BY clause, or if the ORDER BY or GROUP BY contains columns from tables other than the first table in the join queue, a temporary table is created"
Example:
MariaDB [my_db]> EXPLAIN SELECT DISTINCT p.data_prefix FROM my_table p;
+------+-------------+-------+-------+---------------+------------+---------+------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+-------+-------+---------------+------------+---------+------+------+--------------------------+
| 1 | SIMPLE | p | range | NULL | data_prefix | 33 | NULL | 18 | Using index for group-by |
+------+-------------+-------+-------+---------------+------------+---------+------+------+--------------------------+
1 row in set (0.00 sec)
MariaDB [my_db]> EXPLAIN SELECT DISTINCT p.data_prefix FROM my_table p limit 0,40;
+------+-------------+-------+-------+---------------+------------+---------+------+------+-------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+-------+-------+---------------+------------+---------+------+------+-------------------------------------------+
| 1 | SIMPLE | p | range | NULL | data_prefix | 33 | NULL | 18 | Using index for group-by; Using temporary |
+------+-------------+-------+-------+---------------+------------+---------+------+------+-------------------------------------------+
1 row in set (0.00 sec)
MariaDB [my_db]> EXPLAIN SELECT p.data_prefix FROM my_table p group by p.data_prefix;
+------+-------------+-------+-------+---------------+------------+---------+------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+-------+-------+---------------+------------+---------+------+------+--------------------------+
| 1 | SIMPLE | p | range | NULL | data_prefix | 33 | NULL | 18 | Using index for group-by |
+------+-------------+-------+-------+---------------+------------+---------+------+------+--------------------------+
1 row in set (0.00 sec)
MariaDB [my_db]> EXPLAIN SELECT p.data_prefix FROM my_table p group by p.data_prefix limit 0,40;
+------+-------------+-------+-------+---------------+------------+---------+------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+-------+-------+---------------+------------+---------+------+------+--------------------------+
| 1 | SIMPLE | p | range | NULL | data_prefix | 33 | NULL | 18 | Using index for group-by |
+------+-------------+-------+-------+---------------+------------+---------+------+------+--------------------------+
1 row in set (0.00 sec)
MariaDB [my_db]>
Upvotes: 0
Reputation: 425361
You should use GROUP BY
if you need aggregate functions, like SUM
, MAX
etc.
If you only need grouping columns, they are the same (and use the same plan).
Please note that DISTINCT
is not a function, so this clause:
SELECT DISTINCT(id), othercol
which is the same (except for column order) as
SELECT DISTINCT othercol, (id)
or just
SELECT DISTINCT othercol, id
might still give you duplicates on id
if there are records with same id
but different othercol
.
Upvotes: 5
Reputation: 2470
A DISTINCT and GROUP BY usually generate the same query plan, so performance should be the same across both query constructs. GROUP BY should be used to apply aggregate operators to each group. If all you need is to remove duplicates then use DISTINCT. If you are using sub-queries execution plan for that query varies so in that case you need to check the execution plan before making decision of which is faster.
Example of DISTINCT:
SELECT DISTINCT Employee, Rank
FROM Employees
Example of GROUP BY:
SELECT Employee, Rank
FROM Employees
GROUP BY Employee, Rank
Example of GROUP BY with aggregate function:
SELECT Employee, Rank, COUNT(*) EmployeeCount
FROM Employees
GROUP BY Employee, Rank
Reference : Pinal Dave (http://blog.SQLAuthority.com)
Upvotes: 2