Reputation: 61
I have some data in a table called UniqueCrash, which has a hash value, and an issue_id (it's an external bug tracker ID).
What I want to find is a list of UniqueCrash rows where the same hash value - since it is a hash of the call stack - has been assigned to more than one bug tracker ID (because it means that a duplicate bug has been logged).
My test data looks like this (ID, hash, Issue ID):
1 | 12345 | Bug 1
2 | 12345 | Bug 2
3 | 12345 | Bug 3
4 | 123456 | Bug 4
And I want my result to look like this:
1 | 12345 | Bug 1
2 | 12345 | Bug 2
3 | 12345 | Bug 3
(i.e. to just remove the last entry - that hash is only used once)
I don't really know what this technique is called, if indeed there is a name for it. So far I have a query that seems to give almost the right data back:
SELECT UniqueCrash.id, hash, issue_id
FROM UniqueCrash
WHERE hash IN
( SELECT hash FROM
( SELECT UniqueCrash.id, hash, issue_id
FROM UniqueCrash WHERE
UniqueCrash.issue_id IS NOT NULL
GROUP BY hash, issue_id )
GROUP BY hash HAVING COUNT(issue_id) > 1 )
ORDER BY hash ASC, id ASC
But whilst it gives the 3 correct (by ID) rows back, it looks like the issue ID for row 1 vs. row 2 are round the wrong way and I can't see why.
Can anyone suggest a better way to rewrite this query (and one that actually works)?
Upvotes: 0
Views: 92
Reputation: 32640
Just use COUNT DISTINCT
to do that :
SELECT `id`, `hash`, `issue_id`
FROM UniqueCrash
WHERE `hash` IN (
SELECT `hash` from UniqueCrash
GROUP BY `hash`
HAVING COUNT(DISTINCT `issue_id`) > 1)
ORDER BY `hash` ASC, `id` ASC
See DEMO HERE
Upvotes: 1