Self Join query to find duplicate records

Question

I have a table with around 32 columns and I want to check duplicate rows but two columns are unique so they will be different for each row. So I need JOIN condition basically self join with duplicate records where all columns are equal except two.

TT. · Accepted Answer

To find the rows that are duplicate the query would be (col1...col30 would be excluding the two columns that are always unique for each row)

SELECT
  col1,
  col2,
  ...,
  col30
FROM
  your_table
GROUP BY
  col1,
  col2,
  ...,
  col30
HAVING
  COUNT(*)>1

EDIT: It appears from the remarks you want to delete duplicate rows. Say you have columns id1 and id2 that are unique to every row (ie a primary key) and columns col1..col4 that can be duplicate. Take the following script as an example to delete duplicate rows:

CREATE TABLE #tt(
    id1 INT,
    id2 INT,
    col1 INT,
    col2 INT,
    col3 INT,
    col4 INT,
    PRIMARY KEY(id1,id2)
);

INSERT INTO #tt(id1,id2,col1,col2,col3,col4)
VALUES (1,1,1,1,1,1),
       (1,2,1,1,1,1),
       (1,3,1,1,1,1),
       (2,1,1,1,1,1),
       (3,1,2,2,2,2),
       (4,1,2,2,2,2),
       (2,2,2,2,2,2),
       (3,3,3,3,3,3),
       (4,2,3,3,3,3);

DELETE
    t
FROM
    #tt AS t
    INNER JOIN (
        SELECT
            id1,
            id2,
            ROW_NUMBER() OVER (PARTITION BY col1,col2,col3,col4 ORDER BY id1,id2) AS rn
        FROM
            #tt
    ) AS trn ON
        trn.id1=t.id1 AND
        trn.id2=t.id2
WHERE
    trn.rn>1;

SELECT*FROM #tt;

DROP TABLE #tt;

Self Join query to find duplicate records

Answers (2)

Related Questions