Reputation: 873
I have a table GroupsTable
that can be described through a MWE as follows.
GroupID MemberID
1 42
2 42
2 43
3 42
3 43
3 44
I am then given another table MemberTable
that contain some rows with MemberID
. For example:
MemberID
42
43
The query I need is one that finds the matching GroupID
that has exactly the MemberID
from the second table, no more no less. I believe the following code is working, but it is terribly slow, so is there a better way to find the answer?
select c.GroupID from (
select g.GroupID
from GroupTable g
join MemberTable m on m.MemberID = g.MemberID
group by g.GroupID
having count(*) = (select count(*) from MemberTable)
) c
left join GroupTable x on x.GroupID = c.GroupID
and x.MemberID not in (select MemberID from MemberTable)
where x.GroupID is null
Sample data:
create table MemberTable (
MemberID int
)
insert into MemberTable
values
(42),
(43);
create table GroupTable (
GroupID int,
MemberID int
);
insert into GroupTable
values
(1, 42), -- only one member
(2, 42), -- both members
(2, 43),
(3, 42), -- one member too many
(3, 43),
(3, 44),
(4, 40), -- two irrelevant members
(4, 41);
There is a fiddle available here: https://dbfiddle.uk/?rdbms=sqlserver_2017&fiddle=ad818a306600286433634fa83c3628a0
Upvotes: 1
Views: 41
Reputation: 82474
You can use a combination of left join, group by, having, and count distinct, like this:
DECLARE @Count int;
SELECT @Count = COUNT(*) FROM MemberTable;
SELECT GroupID
FROM GroupTable As G
LEFT JOIN MemberTable As M
ON G.MemberID = M.MemberID
GROUP BY GroupID
HAVING COUNT(DISTINCT G.MemberID) = @Count
AND COUNT(DISTINCT M.MemberID) = @Count
The left join ensures you'll get all the records for each group id.
The count of distinct of the groups member ids ensures you will only get the group ids that does not have have member ids that doesn't appear in the members table.
The count distinct of the members table ensures you will not get groups that happen to have the same number of member ids as the members table.
Update
After testing with temp tables on my own environment, I've confirmed my suspicion that the performance killer is the count(distinct). I've changed my query to get rid of it and now it seems to be almost twice as fast as the query in the question:
DECLARE @Count int;
SELECT @Count = COUNT(*) FROM MemberTable;
SELECT GroupID
FROM
(
SELECT DISTINCT GroupID, MemberID
FROM GroupTable
) As G
LEFT JOIN
(
SELECT DISTINCT MemberID
FROM MemberTable
) As M
ON G.MemberID = M.MemberID
GROUP BY GroupID
HAVING COUNT(G.MemberID) = @Count
AND COUNT(M.MemberID) = @Count;
Note that if the MemberTable
is known to always have distinct MemberID values you can get rid of the second derived table and simply left join the first derived table to the MemberTable
directly.
Upvotes: 1