Performance difference between DISTINCT and GROUP BY

Question

My understanding is that in (My)SQL a SELECT DISTINCT should do the same thing as a GROUP BY on all columns, except that GROUP BY does implicit sorting, so these two queries should be the same:

SELECT boardID,threadID FROM posts GROUP BY boardID,threadID ORDER BY NULL LIMIT 100;
SELECT DISTINCT boardID,threadID FROM posts LIMIT 100;

They're both giving me the same results, and they're giving identical output from EXPLAIN:

+----+-------------+-------+------+---------------+------+---------+------+---------+-----------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows    | Extra           |
+----+-------------+-------+------+---------------+------+---------+------+---------+-----------------+
|  1 | SIMPLE      | posts | ALL  | NULL          | NULL | NULL    | NULL | 1263320 | Using temporary |
+----+-------------+-------+------+---------------+------+---------+------+---------+-----------------+
1 row in set

But on my table the query with DISTINCT consistently returns instantly and the one with GROUP BY takes about 4 seconds. I've disabled the query cache to test this.

There's 25 columns so I've also tried creating a separate table containing only the boardID and threadID columns, but the same problem and performance difference persists.

I have to use GROUP BY instead of DISTINCT so I can include additional columns without them being included in the evaluation of DISTINCT. So now I don't how to proceed. Why is there a difference?

Performance difference between DISTINCT and GROUP BY

Answers (1)

Related Questions