Reputation: 1842
I have a table "Conflicts" containing two process ID's (processIDA int, ProcessIDB int).
A unique conflict is defined when 2 processes in any order (A/B or B/A) are entered into this "Conflicts" table.
The conflicts table contains duplicates as follows:
[row..1] ProcessIDA=5, ProcessIDB=6
[row..2] ProcessIDB=6, ProcessIDA=5
What I need to do is to filter out the duplicate conflicts so that I am left with only:
[row..1] ProcessIDA=5, ProcessIDB=6
Note: This table's rows can vary between 5 and 50 million records. Once I can successfully filter out the duplicates, the amount of rows will be exactly half of the current.
Upvotes: 0
Views: 155
Reputation: 11556
If you want to delete the duplicates,then
Query
;with cte as
(
select *,
case when ProcessIDA < ProcessIDB
then ProcessIDA else ProcessIDB end as column1,
case when ProcessIDA < ProcessIDB
then ProcessIDB else ProcessIDA end as column2
from conflicts
),
cte2 as
(
select rn = row_number() over
(
partition by cte.column1,cte.column2
order by cte.column1
),*
from cte
)
delete from cte2
where rn > 1;
Upvotes: 3
Reputation: 57996
You can do a simple self join
;WITH Conflicts AS
(
SELECT *
FROM ( VALUES
(6, 5),
(5, 6),
(1, 2),
(1, 3)
) Sample (ProcessIDA, ProcessIDB)
)
SELECT A.*
FROM Conflicts A
JOIN Conflicts B
ON A.ProcessIDA = B.ProcessIDB AND
A.ProcessIDB = B.ProcessIDA
Upvotes: 1