Reputation:
I would like to get rid of duplicates in my DB. There can be several duplicates of one criterion, which are then grouped together.
Let's say B is duplicate of A, and C is also duplicate of A then there should be a result like
*id* | *duplicate*
A | B, C
But now the result is like:
*id* | *duplicate*
A | B, C
B | C
Which is correct of course. The problem is, that I would like that ids which already appeared as duplicates in the results were not listed again in the column id with their own duplicates.
Here is an example: http://sqlfiddle.com/#!9/61692/1/0
Any suggestions?
Thanks, Paul
Edit:
And here the source of the example (as recommended by Zohar Peled):
CREATE TABLE duplicates
(`id` int, `Name` varchar(7))
;
INSERT INTO duplicates
(`id`, `Name`)
VALUES
(1, 'Bob'),
(2, 'Bob'),
(3, 'Bob'),
(4, 'Alice')
;
SELECT DISTINCT d1.`id`, GROUP_CONCAT(d2.`id`) as duplicates
FROM `duplicates` as d1, `duplicates` as d2
WHERE
d1.`id`< d2.`id` AND
d1.`Name` = d2.`Name`
GROUP BY d1.`id`
Upvotes: 2
Views: 434
Reputation: 33945
This is a rather unorthodox solution, but hey...
SELECT MIN(x.id) id
, GROUP_CONCAT(DISTINCT y.id) duplicates
FROM duplicates x
JOIN duplicates y
ON y.name = x.name
AND y.id > x.id
GROUP
BY x.name
Upvotes: 1