Find duplicates in a One to Many table

Question

I have a very simple table:

create table #Person_Tasks( Person_ID int, Task_ID int)

I like to find all people that worked on the EXACT same tasks. For instance, P1 worked on T1 and T2. How can I find all the other people that ONLY worked on T1 and T2?

Here is some sample data:

insert into #Person_Tasks
values
(0,0),
(0,1),
(0,2),
(1,0),
(1,1),
(1,3),
(2,0),
(2,1),
(2,2)

In this example only P0 and P2 have the exact same tasks. P1 has the same number of tasks but one task doesn't match the other people's task ID.

The table is quite big with 1000s of people and 100Ks of Tasks.

Gordon Linoff · Accepted Answer

Here is one method, assuming that the rows are unique:

with pt as (
      select pt.*, count(*) over (partition by person_id) as num_tasks
      from person_tasks pt
     )
select pt.person_id, pt2.person_id
from pt left join
     pt pt2
     on pt2.task_id = pt.task_id and pt.person_id <> pt2.person_id
group by pt.person_id, pt2.person_id, pt.num_tasks, pt2.num_tasks
having (count(*) = pt.num_tasks and pt.num_tasks = pt2.num_tasks) or
       pt2.person_id is null

This does a self join and then counts the number of matches for each pair. If all the counts match, then the two people have the same tasks.

Here is a db<>fiddle showing it working.

Find duplicates in a One to Many table

Answers (2)

Related Questions