Reputation: 84

Remove Duplicate record from Mysql Table using Group By

I have a table structure and data below.

I need to remove duplicate record from the table list. My confusion is that when I am firing query

SELECT * FROM `table` GROUP BY CONCAT(`name`,department)

then giving me correct list(12 records).

Same query when I am using the subquery:

SELECT * 
FROM `table` WHERE id IN (SELECT id FROM `table` GROUP BY CONCAT(`name`,department))

It returning all record which is wrong.

So, My question is why group by in subquery is not woking.

Upvotes: 1

Answers (3)

Ravinder Reddy

Reputation: 24002

You can JOIN the grouped ids with that of table ids, so that you can get desired results.

Example:

SELECT t.* FROM so_q32175332 t
  JOIN ( SELECT id FROM so_q32175332
          GROUP BY CONCAT( name, department ) ) f
    ON t.id = f.id
 ORDER BY CONCAT( name, department );

Here order by was added just to compare directly the * results on group.

Demo on SQL Fiddle: http://sqlfiddle.com/#!9/d715a/1

Upvotes: 1

Zafar Malik

Reputation: 6844

Actually as Tim mentioned in his answer that it to get first unique record by group by clause is not a standard feature of sql but mysql allows it till mysql5.6.16 version but from 5.6.21 it has been changed.

Just change mysql version in your sql fiddle and check that you will get what you want.

Upvotes: 1

Tim Biegeleisen

Reputation: 521259

In the query

SELECT * FROM `table` GROUP BY CONCAT(`name`,department)

You are selecting the id column, which is a non-aggregate column. Many RDBMS would give you an error, but MySQL allows this for performance reasons. This means MySQL has to choose which record to retain in the result set. Based on the result set in your original problem, it appears that MySQL is retaining the id of the first duplicate record, in cases where a group has more than one member.

In the query

SELECT * 
FROM `table`
WHERE id IN
(
    SELECT id FROM `table` GROUP BY CONCAT(`name`,department)
)

you are also selecting a non-aggregate column in the subquery. It appears that MySQL actually decides which id value to be retained in the subquery based on the id value in the outer query. That is, for each id value in table, MySQL performs the subquery and then selectively chooses to retain a record in the group if two id values match.

You should avoid using a non-aggregate column in a query with GROUP BY, because it is a violation of the ANSI standard, and as you have seen here it can result in unexpected results. If you give us more information about what result set you want, we can give you a correct query which will avoid this problem.

I welcome anyone who has documentation to support these observations to either edit my question or post a new one.

Upvotes: 1

Remove Duplicate record from Mysql Table using Group By

Answers (3)

Related Questions