Graeme Kelly
Graeme Kelly

Reputation: 61

sql(ite) - grouping data on same table

I have some data in a table called UniqueCrash, which has a hash value, and an issue_id (it's an external bug tracker ID).

What I want to find is a list of UniqueCrash rows where the same hash value - since it is a hash of the call stack - has been assigned to more than one bug tracker ID (because it means that a duplicate bug has been logged).

My test data looks like this (ID, hash, Issue ID):


1 | 12345 | Bug 1
2 | 12345 | Bug 2
3 | 12345 | Bug 3
4 | 123456 | Bug 4

And I want my result to look like this:


1 | 12345 | Bug 1
2 | 12345 | Bug 2
3 | 12345 | Bug 3

(i.e. to just remove the last entry - that hash is only used once)

I don't really know what this technique is called, if indeed there is a name for it. So far I have a query that seems to give almost the right data back:

SELECT UniqueCrash.id, hash, issue_id 
FROM UniqueCrash
WHERE hash IN 
    ( SELECT hash FROM 
    ( SELECT UniqueCrash.id, hash, issue_id 
        FROM UniqueCrash WHERE
            UniqueCrash.issue_id IS NOT NULL
        GROUP BY hash, issue_id ) 
    GROUP BY hash HAVING COUNT(issue_id) > 1 )
ORDER BY hash ASC, id ASC

But whilst it gives the 3 correct (by ID) rows back, it looks like the issue ID for row 1 vs. row 2 are round the wrong way and I can't see why.

Can anyone suggest a better way to rewrite this query (and one that actually works)?

Upvotes: 0

Views: 92

Answers (1)

blackbishop
blackbishop

Reputation: 32640

Just use COUNT DISTINCT to do that :

SELECT `id`, `hash`, `issue_id` 
FROM UniqueCrash 
WHERE `hash` IN (
             SELECT `hash` from UniqueCrash
             GROUP BY `hash`
             HAVING COUNT(DISTINCT `issue_id`) > 1)
ORDER BY `hash` ASC, `id`  ASC

See DEMO HERE

Upvotes: 1

Related Questions