Reputation: 2101

How to use GROUP BY to identify row differences

I'm working in SQL Server 2008. I have the following situation. I have a table that has 2 columns which comprise the primary key. (No uniqueness constraint is defined on the key, though.) I know that I have primary key duplicates. Per primary key, I want to identify the distinct values in another column. So, let's say I have the following table:

INSERT INTO some_table (Col1, Col2, COl3) VALUES
('A', '1', 'a'),
('A', '1', 'b'),
('B', '1', 'a'),
('B', '2', 'b'),
('C', '1', 'a'),
('C', '1', 'a'),
('C', '2', 'b')

I want to group by Col1 and Col2, and I want to find all rows where there are more than 1 distinct Col3 values. For example, with the above table, I expect to see: (A, 1, a), (A, 1, b).

How do I write this SQL query? My SELECT statement needs to include Col1, Col2, and Col3. But, if I do a GROUP BY Col1, Col2, then I can't have Col3 in the SELECT statement.

Upvotes: 0

Answers (4)

KekuSemau

Reputation: 6854

This is one way to get there:

SELECT *
FROM
   T T1
WHERE
    EXISTS(SELECT * FROM T T2
       WHERE 
         T1.a = T2.a 
         AND T1.b = T2.b
         AND T1.c <> T2.c);

sql fiddle

or in similar variation, that would allow to give the required minimum distinct number:

WHERE
(SELECT COUNT(DISTINCT T2.c)
 FROM T T2
 WHERE T1.a = T2.a AND T1.b = T2.b) >= 2;

sql fiddle

Upvotes: 0

Arkadiusz

Reputation: 489

Try this:

    CREATE TABLE #some_table
    (
    Col1 char(1),
    Col2 char(1),
    Col3 char(1)
    )

    INSERT INTO #some_table (Col1, Col2, COl3) VALUES
    ('A', '1', 'a'),
    ('A', '1', 'b'),
    ('B', '1', 'a'),
    ('B', '2', 'b'),
    ('C', '1', 'a'),
    ('C', '1', 'a'),
    ('C', '2', 'b')



    select
      stuff((select ',' + '('+Col1 +', '+ Col2 + ', ' + Col3 + ')' from #some_table T_IN
       where T.col1 = T_IN.col1 and T.col2 = T_IN.col2 FOR XML PATH ('')), 1, 1, '') as items
    from
      #some_table T
    group by col1, col2

Upvotes: 0

Zohar Peled

Reputation: 82484

Here is one way to solve this:

;WITH CTE AS (
    Select col1, col2, min(col3) as minvalue, max(col3) as maxvalue
    From myTable
    Group by col1, col2
    Having min(col3) < max(col3)
)

Select * 
From myTable t
Inner join cte 
On t.col1 = cte.col1
     And t.col2 = cte.col2
Where col3 >= minvalue
And col3 <= maxvalue

Note code written directly here, there might be some mistakes.

Upvotes: 0

James Z

Reputation: 12317

You can't really have more than one item selected per group in group by, but maybe you need something like this:

select
  col1,
  col2,
  stuff((select ',' + col3 from some_table t2
   where t1.col1 = t2.col1 and t1.col2 = t2.col2
   FOR XML PATH ('')), 1, 1, '') as items
from
  some_table t1
group by
  col1,
  col2
having count(distinct col3) > 1

This will return the "duplicate" items in comma separated list in the 3rd column.

SQL Fiddle

Upvotes: 1

How to use GROUP BY to identify row differences

Answers (4)

Related Questions