faerin
faerin

Reputation: 1925

Find duplicate entries on several columns and tables

I am trying to figure out how to find duplicats based on several different columns and tables.

I've got these tables:

I want to find exact matches in my table products on columns productName, brandid, origin. But to cast the row as a duplicate I also need to compare so that they have the exact same tags (column: tagid) and groups (column: groupid) assigned.

Every product may have multiple tags and multiple groups.

This is what I've come up with... but it's not quite doing what I need it to.

SQLFiddle http://sqlfiddle.com/#!9/43f19/1

In my SQL fiddle example I have listed 10 different products.

For example, products 1,2 are exact matches and thus should be listed as a duplicate. Product number 3 only has one group assigned and thus differ from product 1 and 2 even if any other parameter fits (it should not be listed). My intention with the dupid column would be to list the first entry of a set of duplicates.

id   |  name  |   brandid   |   origin   |    tags    |    groups   |   dupid
 1   |  prod  |      1      |  England   |    1,2     |    1,2      |     1    
 2   |  prod  |      1      |  England   |    1,2     |    1,2      |     1
 3   |  prod  |      1      |  England   |    1,2     |      1      |     3

Complete set of items that should be listed as exact matches in my SQL fiddle are:

id 1
id 2
id 4
id 5

My guess why this fails is that I have not succeeded to involve the tags and the groups correctly into my comparison.

SELECT m.*,dup.id AS dupid,GROUP_CONCAT(DISTINCT t.tagid ORDER BY t.tagid ASC) AS alltags,GROUP_CONCAT(DISTINCT g.groupid ORDER BY g.groupid ASC) AS groups
FROM `products` m
JOIN (SELECT id,`productName`, brandid, origin, COUNT(*) AS c FROM products 
GROUP BY `productName`, brandid, origin HAVING c > 1) dup ON m.`productName` = dup.`productName` AND m.brandid = dup.brandid AND m.origin = dup.origin
LEFT JOIN tags AS t ON t.productid = m.id
LEFT JOIN groups AS g ON g.productid = m.id
GROUP BY m.id
ORDER BY `productName`,brandid,origin

Any help and/or advice on how to achieve this is highly appricated.

Upvotes: 0

Views: 63

Answers (1)

sagi
sagi

Reputation: 40481

My guess is that you are missing an aggregation function on the subquery on ID field, also - you need to group by productname,origin and brand and not id so try this:

SELECT m.*,dup.id AS dupid,GROUP_CONCAT(DISTINCT t.tagid ORDER BY t.tagid ASC) AS alltags,GROUP_CONCAT(DISTINCT g.groupid ORDER BY g.groupid ASC) AS groups
FROM `products` m
JOIN (SELECT min(id) as id,`productName`, brandid, origin, COUNT(*) AS c FROM products 
GROUP BY `productName`, brandid, origin HAVING c > 1) dup ON m.`productName` = dup.`productName` AND m.brandid = dup.brandid AND m.origin = dup.origin
LEFT JOIN tags AS t ON t.productid = m.id
LEFT JOIN groups AS g ON g.productid = m.id
GROUP BY m.`productName`,m.brandid,m.origin
ORDER BY m.`productName`,m.brandid,m.origin

EDIT : You can use this query:

SELECT tt.*
FROM(
     SELECT m.*,GROUP_CONCAT(DISTINCT t.tagid ORDER BY t.tagid ASC) AS alltags,GROUP_CONCAT(DISTINCT g.groupid ORDER BY g.groupid ASC) AS groups
     FROM `products` m
     LEFT JOIN tags AS t ON t.productid = m.id
     LEFT JOIN groups AS g ON g.productid = m.id
     GROUP BY m.id) tt
INNER JOIN
    (SELECT productName,brandid,origin,alltags,groups
     FROM
        (SELECT m.*,GROUP_CONCAT(DISTINCT t.tagid ORDER BY t.tagid ASC) AS alltags,GROUP_CONCAT(DISTINCT g.groupid ORDER BY g.groupid ASC) AS groups
         FROM `products` m
         LEFT JOIN tags AS t ON t.productid = m.id
         LEFT JOIN groups AS g ON g.productid = m.id
         GROUP BY m.id) s
    GROUP BY  productName,brandid,origin,alltags,groups
    HAVING COUNT(*) > 1) ss
  ON(tt.productName = ss.productName and tt.brandid = ss.brandid and tt.origin = ss.origin
    and tt.alltags = ss.alltags and tt.groups = ss.groups)

Upvotes: 1

Related Questions