Poonam Singhania
Poonam Singhania

Reputation: 347

Self Join query to find duplicate records

I have a table with around 32 columns and I want to check duplicate rows but two columns are unique so they will be different for each row. So I need JOIN condition basically self join with duplicate records where all columns are equal except two.

Upvotes: 3

Views: 14163

Answers (2)

TT.
TT.

Reputation: 16146

To find the rows that are duplicate the query would be (col1...col30 would be excluding the two columns that are always unique for each row)

SELECT
  col1,
  col2,
  ...,
  col30
FROM
  your_table
GROUP BY
  col1,
  col2,
  ...,
  col30
HAVING
  COUNT(*)>1

EDIT: It appears from the remarks you want to delete duplicate rows. Say you have columns id1 and id2 that are unique to every row (ie a primary key) and columns col1..col4 that can be duplicate. Take the following script as an example to delete duplicate rows:

CREATE TABLE #tt(
    id1 INT,
    id2 INT,
    col1 INT,
    col2 INT,
    col3 INT,
    col4 INT,
    PRIMARY KEY(id1,id2)
);

INSERT INTO #tt(id1,id2,col1,col2,col3,col4)
VALUES (1,1,1,1,1,1),
       (1,2,1,1,1,1),
       (1,3,1,1,1,1),
       (2,1,1,1,1,1),
       (3,1,2,2,2,2),
       (4,1,2,2,2,2),
       (2,2,2,2,2,2),
       (3,3,3,3,3,3),
       (4,2,3,3,3,3);

DELETE
    t
FROM
    #tt AS t
    INNER JOIN (
        SELECT
            id1,
            id2,
            ROW_NUMBER() OVER (PARTITION BY col1,col2,col3,col4 ORDER BY id1,id2) AS rn
        FROM
            #tt
    ) AS trn ON
        trn.id1=t.id1 AND
        trn.id2=t.id2
WHERE
    trn.rn>1;

SELECT*FROM #tt;

DROP TABLE #tt;

Upvotes: 4

shA.t
shA.t

Reputation: 16968

You can also use something like this:

SELECT *
FROM yourTable t
WHERE 1 < (
    SELECT COUNT(*) 
    FROM yourTable ti
    WHERE t.uniqueCol1 = ti.uniqueCol1 AND t.uniqueCol2 = ti.uniqueCol2
    GROUP BY nonUniqueCol1, nonUniqueCol2, ..., nonUniqueColn)

or for filtering just duplicated rows without showing original rows you can use:

SELECT *
FROM (SELECT *,
          ROW_NUMBER() OVER (PARTITION BY nonUniqueCol1, nonUniqueCol2, ..., nonUniqueColn
                             ORDER BY uniqueCol1, uniqueCol2) As seq
      FROM yourTable) t
WHERE (seq > 1)

Upvotes: 0

Related Questions