Reputation: 3243
Probably be better if I explain via data :)
Say if I have a table
ItemId Identifier1 Identifier2
1 'Test 1' NULL
1 'Test 2' NULL
So ItemId 1
is identified by 'Test 1' AND 'Test 2'. This could be a combination of Identifer1 or Identifier 2 e.g.
ItemId Identifier1 Identifier2
1 'Test 1' NULL
1 NULL 1
In this case ItemId 1
is identified by 'Test 1' and the number 1. And Item can have as many rows to identify as is possible.
I want to find all records (in a table with same structure as above) which have duplicate identifiers.
My current incorrect approach validates with just one row, so
ItemId Identifier1 Identifier2
1 'Test 1' NULL
1 'Test 2' NULL
and
ItemId Identifier1 Identifier2
2 'Test 1' NULL
2 'Test 3' NULL
are coming back as duplicate records because of the 'Test 1' match. This is incorrect. It needs to check all rows.
I thought about using an 'ALL' but that didn't help! I imagine I would need to use some magic with an aggregate/window function, but I cannot come up with the correct query.
Please help!
Upvotes: 0
Views: 416
Reputation: 1269633
You want to find all items where all rows for the item match exactly the same rows for another item.
This is more complicated because you have two columns identifying each row and each NULL
s are important. But not too complicated:
SELECT t1.itemid, t2.itemid
FROM (SELECT t.*, COUNT(*) OVER (PARTITION BY itemid) AS cnt
FROM table t
) t1 FULL OUTER JOIN
(SELECT t.*, COUNT(*) OVER (PARTITION BY itemid) AS cnt
FROM table t
) t2
ON t1.itemId <> t2.itemId AND t1.cnt = t2.cnt AND
(t1.identifier1 = t2.identifier1 OR t1.identifier1 IS NULL AND t2.identifier1 IS NULL) AND
(t1.identifier2 = t2.identifier2 OR t1.identifier2 IS NULL AND t2.identifier2 IS NULL)
GROUP BY t1.itemid, t2.itemid
HAVING COUNT(*) = COUNT(t1.itemid) AND COUNT(*) = COUNT(t2.itemid);
Upvotes: 1