Finding an exact match over several rows

Question

I have a table GroupsTable that can be described through a MWE as follows.

GroupID  MemberID
1        42
2        42
2        43
3        42
3        43
3        44

I am then given another table MemberTable that contain some rows with MemberID. For example:

MemberID
42
43

The query I need is one that finds the matching GroupID that has exactly the MemberID from the second table, no more no less. I believe the following code is working, but it is terribly slow, so is there a better way to find the answer?

select c.GroupID from (
  select g.GroupID 
  from GroupTable g 
  join MemberTable m on m.MemberID = g.MemberID
  group by g.GroupID
  having count(*) = (select count(*) from MemberTable)
) c 
left join GroupTable x on x.GroupID = c.GroupID
and x.MemberID not in (select MemberID from MemberTable)
where x.GroupID is null

Sample data:

create table MemberTable (
  MemberID int
)
insert into MemberTable
values 
(42), 
(43);

create table GroupTable (
  GroupID int,
  MemberID int
);
insert into GroupTable 
values 
(1, 42), -- only one member
(2, 42), -- both members
(2, 43), 
(3, 42), -- one member too many
(3, 43),
(3, 44),
(4, 40), -- two irrelevant members
(4, 41);

There is a fiddle available here: https://dbfiddle.uk/?rdbms=sqlserver_2017&fiddle=ad818a306600286433634fa83c3628a0

Zohar Peled · Accepted Answer

You can use a combination of left join, group by, having, and count distinct, like this:

DECLARE @Count int;
SELECT @Count = COUNT(*) FROM MemberTable;

SELECT GroupID 
FROM GroupTable As G
LEFT JOIN  MemberTable As M
    ON G.MemberID = M.MemberID
GROUP BY GroupID
HAVING COUNT(DISTINCT G.MemberID) = @Count 
AND  COUNT(DISTINCT M.MemberID) = @Count

The left join ensures you'll get all the records for each group id.

The count of distinct of the groups member ids ensures you will only get the group ids that does not have have member ids that doesn't appear in the members table.

The count distinct of the members table ensures you will not get groups that happen to have the same number of member ids as the members table.

Update

After testing with temp tables on my own environment, I've confirmed my suspicion that the performance killer is the count(distinct). I've changed my query to get rid of it and now it seems to be almost twice as fast as the query in the question:

DECLARE @Count int;
SELECT @Count = COUNT(*) FROM MemberTable;

SELECT GroupID 
FROM 
(
    SELECT DISTINCT GroupID, MemberID
    FROM GroupTable
) As G
LEFT JOIN 
(
    SELECT DISTINCT MemberID
    FROM MemberTable
) As M
    ON G.MemberID = M.MemberID
GROUP BY GroupID
HAVING COUNT(G.MemberID) = @Count 
AND  COUNT(M.MemberID) = @Count;

Note that if the MemberTable is known to always have distinct MemberID values you can get rid of the second derived table and simply left join the first derived table to the MemberTable directly.

Finding an exact match over several rows

Answers (1)

Related Questions