TheLegendaryCopyCoder
TheLegendaryCopyCoder

Reputation: 1842

SQL - Filter redundant duplicates

I have a table "Conflicts" containing two process ID's (processIDA int, ProcessIDB int).

A unique conflict is defined when 2 processes in any order (A/B or B/A) are entered into this "Conflicts" table.

The conflicts table contains duplicates as follows:

[row..1] ProcessIDA=5, ProcessIDB=6

[row..2] ProcessIDB=6, ProcessIDA=5

What I need to do is to filter out the duplicate conflicts so that I am left with only:

[row..1] ProcessIDA=5, ProcessIDB=6

Note: This table's rows can vary between 5 and 50 million records. Once I can successfully filter out the duplicates, the amount of rows will be exactly half of the current.

Upvotes: 0

Views: 155

Answers (2)

Ullas
Ullas

Reputation: 11556

If you want to delete the duplicates,then

Query

;with cte as
(
  select *,
  case when ProcessIDA < ProcessIDB
  then ProcessIDA else ProcessIDB end as column1,
  case when ProcessIDA < ProcessIDB
  then ProcessIDB else ProcessIDA end as column2
  from conflicts
),
cte2 as
(
    select  rn = row_number() over
    (
        partition by cte.column1,cte.column2
        order by cte.column1
    ),*
    from cte
)
delete from cte2
where rn > 1;

SQL Fiddle

Upvotes: 3

Rubens Farias
Rubens Farias

Reputation: 57996

You can do a simple self join

;WITH   Conflicts   AS
(
    SELECT      *
    FROM    (   VALUES
                (6, 5),
                (5, 6),
                (1, 2),
                (1, 3)
            )   Sample (ProcessIDA, ProcessIDB)
)
SELECT  A.*
FROM    Conflicts A
JOIN    Conflicts B
    ON  A.ProcessIDA = B.ProcessIDB AND
        A.ProcessIDB = B.ProcessIDA

Upvotes: 1

Related Questions