Reputation: 1925
I am trying to figure out how to find duplicats based on several different columns and tables.
I've got these tables:
I want to find exact matches in my table products on columns productName
, brandid
, origin
. But to cast the row as a duplicate I also need to compare so that they have the exact same tags (column: tagid
) and groups (column: groupid
) assigned.
Every product may have multiple tags and multiple groups.
This is what I've come up with... but it's not quite doing what I need it to.
SQLFiddle http://sqlfiddle.com/#!9/43f19/1
In my SQL fiddle example I have listed 10 different products.
For example, products 1,2 are exact matches and thus should be listed as a duplicate. Product number 3 only has one group assigned and thus differ from product 1 and 2 even if any other parameter fits (it should not be listed). My intention with the dupid column would be to list the first entry of a set of duplicates.
id | name | brandid | origin | tags | groups | dupid
1 | prod | 1 | England | 1,2 | 1,2 | 1
2 | prod | 1 | England | 1,2 | 1,2 | 1
3 | prod | 1 | England | 1,2 | 1 | 3
Complete set of items that should be listed as exact matches in my SQL fiddle are:
id 1
id 2
id 4
id 5
My guess why this fails is that I have not succeeded to involve the tags and the groups correctly into my comparison.
SELECT m.*,dup.id AS dupid,GROUP_CONCAT(DISTINCT t.tagid ORDER BY t.tagid ASC) AS alltags,GROUP_CONCAT(DISTINCT g.groupid ORDER BY g.groupid ASC) AS groups
FROM `products` m
JOIN (SELECT id,`productName`, brandid, origin, COUNT(*) AS c FROM products
GROUP BY `productName`, brandid, origin HAVING c > 1) dup ON m.`productName` = dup.`productName` AND m.brandid = dup.brandid AND m.origin = dup.origin
LEFT JOIN tags AS t ON t.productid = m.id
LEFT JOIN groups AS g ON g.productid = m.id
GROUP BY m.id
ORDER BY `productName`,brandid,origin
Any help and/or advice on how to achieve this is highly appricated.
Upvotes: 0
Views: 63
Reputation: 40481
My guess is that you are missing an aggregation function on the subquery on ID field, also - you need to group by productname,origin and brand and not id so try this:
SELECT m.*,dup.id AS dupid,GROUP_CONCAT(DISTINCT t.tagid ORDER BY t.tagid ASC) AS alltags,GROUP_CONCAT(DISTINCT g.groupid ORDER BY g.groupid ASC) AS groups
FROM `products` m
JOIN (SELECT min(id) as id,`productName`, brandid, origin, COUNT(*) AS c FROM products
GROUP BY `productName`, brandid, origin HAVING c > 1) dup ON m.`productName` = dup.`productName` AND m.brandid = dup.brandid AND m.origin = dup.origin
LEFT JOIN tags AS t ON t.productid = m.id
LEFT JOIN groups AS g ON g.productid = m.id
GROUP BY m.`productName`,m.brandid,m.origin
ORDER BY m.`productName`,m.brandid,m.origin
EDIT : You can use this query:
SELECT tt.*
FROM(
SELECT m.*,GROUP_CONCAT(DISTINCT t.tagid ORDER BY t.tagid ASC) AS alltags,GROUP_CONCAT(DISTINCT g.groupid ORDER BY g.groupid ASC) AS groups
FROM `products` m
LEFT JOIN tags AS t ON t.productid = m.id
LEFT JOIN groups AS g ON g.productid = m.id
GROUP BY m.id) tt
INNER JOIN
(SELECT productName,brandid,origin,alltags,groups
FROM
(SELECT m.*,GROUP_CONCAT(DISTINCT t.tagid ORDER BY t.tagid ASC) AS alltags,GROUP_CONCAT(DISTINCT g.groupid ORDER BY g.groupid ASC) AS groups
FROM `products` m
LEFT JOIN tags AS t ON t.productid = m.id
LEFT JOIN groups AS g ON g.productid = m.id
GROUP BY m.id) s
GROUP BY productName,brandid,origin,alltags,groups
HAVING COUNT(*) > 1) ss
ON(tt.productName = ss.productName and tt.brandid = ss.brandid and tt.origin = ss.origin
and tt.alltags = ss.alltags and tt.groups = ss.groups)
Upvotes: 1