Andrew Arthur
Andrew Arthur

Reputation: 1603

SQL Group By with multiple columns - return all rows where the combination is greater than 1

I have a table with the following data:

sequence    subSequence    Note
1           1              'A'
1           2              'B'
2           1              'C'
2           4              'D'
2           4              'E'
2           3              'F'
3           3              'G'
3           3              'H'
3           3              'I'

The following query returns rows where the combination of sequence and subSequence is greater than one.

SELECT sequence,subSequence,count(*) AS _rowCount
FROM [MyTable]
GROUP BY sequence,subSequence
HAVING count(*)>1

Result:

sequence subSequence _rowCount
2        4           2
3        3           3

I would like to return all rows where the combinations of sequence and subSequence are greater than one.

Result:

sequence subSequence Note
2        4           'D'
2        4           'E'
3        3           'G'
3        3           'H'
3        3           'I'

I've tried the following query but it's taking quite a while to execute and I'm not sure if it's correct. So I either need help optimizing it, or correcting it or any alternative queries:

SELECT *
FROM [MyTable]
WHERE sequence IN
        (SELECT sequence
        FROM [MyTable]
        GROUP BY sequence,subSequence
        HAVING count(*)>1)
    AND subSequence IN
        (SELECT subSequence
        FROM [MyTable]
        GROUP BY sequence,subSequence
        HAVING count(*)>1)

Upvotes: 0

Views: 1102

Answers (2)

Gordon Linoff
Gordon Linoff

Reputation: 1269763

Use window functions:

SELECT t.*
FROM (SELECT t.*,
             COUNT(*) OVER (PARTITION BY sequence, subSequence) AS rowCount
       FROM [MyTable]
     ) t
WHERE rowCount > 1;

Upvotes: 2

scragar
scragar

Reputation: 6824

SELECT [MyTable].*, MySubQuery.rowCount
FROM [MyTable]
JOIN (
    SELECT sequence,subSequence,count(*) AS rowCount
    FROM [MyTable]
    GROUP BY sequence,subSequence
    HAVING count(*)>1
) As MySubQuery
ON MySubQuery.sequence = [MyTable].sequence
AND MySubQuery.subSequence= [MyTable].subSequence

Use a subquery and join it.

If this is still running slow I would recommend creating an index on sequence,subSequence like so:

 ALTER TABLE [MyTable]
 ADD INDEX idxSeqSubSeq (sequence,subSequence);

This will let the database engine take advantage of index to avoid looking at rows it doesn't need to consider.

Upvotes: 2

Related Questions