doub1ejack
doub1ejack

Reputation: 11171

GROUP BY x vs DISTINCT( x )

If I have a table with duplicate ids I will get the same result if I use GROUP BY id as if I use SELECT DISTINCT(id), right?

So when should I prefer one option over the other?

Upvotes: 3

Views: 2143

Answers (4)

Rajat
Rajat

Reputation: 5803

An example of when you might prefer group by over a distinct. Consider a scenario where a window function (not necessarily row_number()) needs to be applied on the distinct result set. Respecting the order of operation, you would have to go with something like this using distinct

select id, row_number() over (order by id) as rn
from (select distinct id from my_table) t;

The same could be achieved without using a subquery using group by

select id, row_number() over (order by id) as rn 
from my_table
group by id;

This was possible because window functions are applied after group by but before distinct

Upvotes: 0

Bast
Bast

Reputation: 671

Just extra information:

It's better to use GROUP BY instead of DISTINCT if you are querying an indexed field and have a LIMIT, because it would use the index, and not a temporary table

See these links:


Example:

MariaDB [my_db]> EXPLAIN SELECT DISTINCT p.data_prefix FROM my_table p;
+------+-------------+-------+-------+---------------+------------+---------+------+------+--------------------------+
| id   | select_type | table | type  | possible_keys | key        | key_len | ref  | rows | Extra                    |
+------+-------------+-------+-------+---------------+------------+---------+------+------+--------------------------+
|    1 | SIMPLE      | p     | range | NULL          | data_prefix | 33      | NULL |   18 | Using index for group-by |
+------+-------------+-------+-------+---------------+------------+---------+------+------+--------------------------+
1 row in set (0.00 sec)

MariaDB [my_db]> EXPLAIN SELECT DISTINCT p.data_prefix FROM my_table p limit 0,40;
+------+-------------+-------+-------+---------------+------------+---------+------+------+-------------------------------------------+
| id   | select_type | table | type  | possible_keys | key        | key_len | ref  | rows | Extra                                     |
+------+-------------+-------+-------+---------------+------------+---------+------+------+-------------------------------------------+
|    1 | SIMPLE      | p     | range | NULL          | data_prefix | 33      | NULL |   18 | Using index for group-by; Using temporary |
+------+-------------+-------+-------+---------------+------------+---------+------+------+-------------------------------------------+
1 row in set (0.00 sec)

MariaDB [my_db]> EXPLAIN SELECT p.data_prefix FROM my_table p group by p.data_prefix;
+------+-------------+-------+-------+---------------+------------+---------+------+------+--------------------------+
| id   | select_type | table | type  | possible_keys | key        | key_len | ref  | rows | Extra                    |
+------+-------------+-------+-------+---------------+------------+---------+------+------+--------------------------+
|    1 | SIMPLE      | p     | range | NULL          | data_prefix | 33      | NULL |   18 | Using index for group-by |
+------+-------------+-------+-------+---------------+------------+---------+------+------+--------------------------+
1 row in set (0.00 sec)

MariaDB [my_db]> EXPLAIN SELECT p.data_prefix FROM my_table p group by p.data_prefix limit 0,40;
+------+-------------+-------+-------+---------------+------------+---------+------+------+--------------------------+
| id   | select_type | table | type  | possible_keys | key        | key_len | ref  | rows | Extra                    |
+------+-------------+-------+-------+---------------+------------+---------+------+------+--------------------------+
|    1 | SIMPLE      | p     | range | NULL          | data_prefix | 33      | NULL |   18 | Using index for group-by |
+------+-------------+-------+-------+---------------+------------+---------+------+------+--------------------------+
1 row in set (0.00 sec)

MariaDB [my_db]>

Upvotes: 0

Quassnoi
Quassnoi

Reputation: 425361

You should use GROUP BY if you need aggregate functions, like SUM, MAX etc.

If you only need grouping columns, they are the same (and use the same plan).

Please note that DISTINCT is not a function, so this clause:

SELECT DISTINCT(id), othercol

which is the same (except for column order) as

SELECT DISTINCT othercol, (id)

or just

SELECT DISTINCT othercol, id

might still give you duplicates on id if there are records with same id but different othercol.

Upvotes: 5

AltF4_
AltF4_

Reputation: 2470

A DISTINCT and GROUP BY usually generate the same query plan, so performance should be the same across both query constructs. GROUP BY should be used to apply aggregate operators to each group. If all you need is to remove duplicates then use DISTINCT. If you are using sub-queries execution plan for that query varies so in that case you need to check the execution plan before making decision of which is faster.

Example of DISTINCT:
 SELECT DISTINCT Employee, Rank
 FROM Employees

Example of GROUP BY:
 SELECT Employee, Rank
 FROM Employees
 GROUP BY Employee, Rank

Example of GROUP BY with aggregate function:
 SELECT Employee, Rank, COUNT(*) EmployeeCount
 FROM Employees
 GROUP BY Employee, Rank 

Reference : Pinal Dave (http://blog.SQLAuthority.com)

Upvotes: 2

Related Questions