SQL - Filter redundant duplicates

Question

I have a table "Conflicts" containing two process ID's (processIDA int, ProcessIDB int).

A unique conflict is defined when 2 processes in any order (A/B or B/A) are entered into this "Conflicts" table.

The conflicts table contains duplicates as follows:

[row..1] ProcessIDA=5, ProcessIDB=6

[row..2] ProcessIDB=6, ProcessIDA=5

What I need to do is to filter out the duplicate conflicts so that I am left with only:

[row..1] ProcessIDA=5, ProcessIDB=6

Note: This table's rows can vary between 5 and 50 million records. Once I can successfully filter out the duplicates, the amount of rows will be exactly half of the current.

Ullas · Accepted Answer

If you want to delete the duplicates,then

Query

;with cte as
(
  select *,
  case when ProcessIDA < ProcessIDB
  then ProcessIDA else ProcessIDB end as column1,
  case when ProcessIDA < ProcessIDB
  then ProcessIDB else ProcessIDA end as column2
  from conflicts
),
cte2 as
(
    select  rn = row_number() over
    (
        partition by cte.column1,cte.column2
        order by cte.column1
    ),*
    from cte
)
delete from cte2
where rn > 1;

SQL Fiddle

SQL - Filter redundant duplicates

Answers (2)

Related Questions