chhenning
chhenning

Reputation: 2077

Find duplicates in a One to Many table

I have a very simple table:

create table #Person_Tasks( Person_ID int, Task_ID int)

I like to find all people that worked on the EXACT same tasks. For instance, P1 worked on T1 and T2. How can I find all the other people that ONLY worked on T1 and T2?

Here is some sample data:

insert into #Person_Tasks
values
(0,0),
(0,1),
(0,2),
(1,0),
(1,1),
(1,3),
(2,0),
(2,1),
(2,2)

In this example only P0 and P2 have the exact same tasks. P1 has the same number of tasks but one task doesn't match the other people's task ID.

The table is quite big with 1000s of people and 100Ks of Tasks.

Upvotes: 1

Views: 77

Answers (2)

Gordon Linoff
Gordon Linoff

Reputation: 1269993

Here is one method, assuming that the rows are unique:

with pt as (
      select pt.*, count(*) over (partition by person_id) as num_tasks
      from person_tasks pt
     )
select pt.person_id, pt2.person_id
from pt left join
     pt pt2
     on pt2.task_id = pt.task_id and pt.person_id <> pt2.person_id
group by pt.person_id, pt2.person_id, pt.num_tasks, pt2.num_tasks
having (count(*) = pt.num_tasks and pt.num_tasks = pt2.num_tasks) or
       pt2.person_id is null

This does a self join and then counts the number of matches for each pair. If all the counts match, then the two people have the same tasks.

Here is a db<>fiddle showing it working.

Upvotes: 4

iSR5
iSR5

Reputation: 3498

based on your sample, this will work as well :

SELECT Person_ID
FROM (
    SELECT *,
    COUNT(Task_ID) OVER(PARTITION BY Person_ID) cnt
    FROM Person_Tasks
    WHERE 
        Task_ID IN(1,2)
) D 
WHERE 
    cnt > 1
GROUP BY Person_ID

Upvotes: 0

Related Questions