user1704276
user1704276

Reputation: 51

Group by column A but comparing column B

This one has had me stumped for the last few hours and at this stage I think I need some help...

I need to compare multiple groups from a single table and to identify where items listed in col B match. For example: -

Col A...............Col B
John................Apple
John................Orange
John................Banana
Mary................Orange
Mary................Strawberry
David...............Apple
David...............Orange
David...............Banana

I want 'John' and 'David' returned because their items in col B match. Hope this makes sense! Thanks in advance! G

Upvotes: 5

Views: 178

Answers (3)

paparazzo
paparazzo

Reputation: 45096

ColA1 matches ColA2 if:
Count (ColA1) = Count (ColA2) = Count (ColA1 x ColA2)

This approach attempts to optimize query speed.

Materialize the raw count as it is used more than once and can declare a PK.
(a CTE is just syntax and is evaluated)

The where RA.rawcount = RB.rawcount allows for only evaluate the join if the counts are equal. And the query plan indicates that it is performed first.

create table #rawcount
(ColA varchar(50) not null primary key, rawcount int  not null)  
insert into #rawcount
select   [ColA], COUNT(*) as  [rawCount]
from     [tbl]
group by [ColA]
order by [ColA]

select a.ColA as ColA1, b.ColA as ColA2, COUNT(*) [matchcount]
from tbl A
join tbl B
 on  a.ColB = b.ColB 
 and a.ColA < b.ColA
join #rawcount RA 
 on  RA.ColA = A.ColA
join #rawcount RB 
 on  RB.ColA = B.ColA
where RA.rawcount = RB.rawcount  -- only evaluate if count same
group by a.ColA, b.ColA, RA.rawcount
having COUNT(*) = RA.rawcount 

Upvotes: 0

RichardTheKiwi
RichardTheKiwi

Reputation: 107696

Here's the SQL Fiddle for this solution, so that you can play with it yourself.

 select A.ColA Person1, B.ColA Person2
    from (select ColA, count(ColB) CountBs
          from tbl
          group by ColA) G1
    join (select ColA, count(ColB) CountBs
          from tbl
          group by ColA) G2 on G1.ColA < G2.ColA
                           and G1.CountBs = G2.CountBs
    join tbl A on A.ColA = G1.ColA
    join tbl B on B.ColA = G2.ColA and A.ColB = B.ColB
group by A.ColA, B.ColA, G1.CountBs
having count(distinct A.ColB) = G1.CountBs

-- subqueries G1 and G2 are the same and count the expected colB's per colA
-- G1 and G2 are joined together to get the candidate matches
--    of ColA with the same number of ColB's
-- we then use G1 and G2 to join into tbl, and further join
--    between A and B where the ColB's match
-- finally, we count the matches between A and B and make sure the counts match
--    the expected count of B's for the pairing

Upvotes: 6

d89761
d89761

Reputation: 1434

All the people who have an item in column b that is matched to more than on person (I'm assuming you are looking for possibly more than just 2 matches?):

SELECT tableName.ColA, tableName.ColB
FROM (SELECT ColB
    FROM tableName
    GROUP BY ColB
    HAVING COUNT(1) > 1) fruits
INNER JOIN tableName ON fruits.ColB = tableName.ColB
ORDER BY tableName.ColB, tableName.ColA

Upvotes: 0

Related Questions