Reputation: 71

Delete duplicate data and load into another table in SQL Server

I have a question about SQL Server.

Table: emp

empid   |  name |sal
1       |  abc  |100
2       |  def  |200
3       |  test |300
2       |  har  |500
3       |  jai  |600
4       | kali  |240

This table has duplicate data based on above table I want delete duplicate data from the emp table

And duplicate data should be loaded into empduplicate table.

Here empid is unique. If empid shows up multiple times, then that record is considered a duplicate.

empduplicate structure look like this:

Empid   |  name  | sal

Finally after deleting duplicate data, I want see data in emp table look like this:

empid  |  name  | sal 
1      |  abc   | 100
4      | kali   | 240

For deleting the duplicates, I tried this code:

;with duplicate as 
(
    select 
        *,
        row_number()over (partition by empid order by empid) as rn
    from emp
)
delete from duplicate 
where rn > 1

But I am not able to delete entire records.

Example: empid=2 has duplicate data

empid|name |sal
2    |def  |200
2    |har  |500

I need to delete entire empid=2 corresponding records. empid=2 has duplicate and need to delete it from the emp table.

And empduplicate table need to load duplicate data look like this:

empid    | name   |sal
2        |def     |200
2        |har     |500
3        |test    |300
3        |jai     |600

For inserting the duplicate data I tried this code:

insert into empduplicate 
    select 
        id, name, sal 
    from 
         emp  
    group by 
         id 
    having 
         count(*) > 1

That query throws an error:

Column 'duplicate.name' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.

Please tell me how to write query to achieve my task in SQL Server

Upvotes: 5

Answers (3)

BalajiShriram

Reputation: 1

BEGIN TRAN
SELECT * INTO empduplicate FROM
(
 SELECT * 
 FROM emp
 WHERE empid IN (
  SELECT empid FROM emp 
  GROUP BY empid
  HAVING COUNT(empid)>1
 )
) as M

DELETE FROM emp WHERE empid IN (
 SELECT empid FROM emp
 GROUP BY empid
 HAVING COUNT(empid)>1
)

COMMIT TRAN

Upvotes: 0

Amee

Reputation: 347

SELECT DISTINCT * INTO #tmp FROM emp
 DELETE FROM emp
INSERT INTO emp
SELECT * FROM #tmp DROP table #tmp

SELECT * FROM emp ---------------------------- All Distinct ID

SELECT * INTO #tmp FROM emp
WHERE empid in(
    SELECT empid FROM emp 
    group by empid having count(*) = 1
)
DELETE FROM emp
INSERT INTO emp
SELECT * FROM #tmp DROP table #tmp

SELECT * FROM emp ----------------------------All ID which is not duplicate

INSERT INTO empduplicate  
  SELECT * FROM emp where empid in(
    SELECT empid FROM emp 
    group by empid having count(*) >1
)

SELECT * FROM empduplicate  -------------------ALL Duplicate value.

Upvotes: 0

Felix Pamittan

Reputation: 31879

You're almost there. Instead of using ROW_NUMBER, use COUNT:

WITH CteInsert AS(
    SELECT *,
        cnt = COUNT(empid) OVER(PARTITION BY empid)
    FROM emp
)
INSERT INTO empduplicate(empid, name, sal)
SELECT
    empid, name, sal
FROM CteInsert
WHERE cnt > 1;

WITH CteDelete AS(
    SELECT *,
        cnt = COUNT(empid) OVER(PARTITION BY empid)
    FROM emp
)
DELETE FROM CteDelete WHERE cnt > 1;

You need to execute the INSERT first before the DELETE. Additionally, you might want to enclose this in a single transaction.

Upvotes: 2

Delete duplicate data and load into another table in SQL Server

Answers (3)

Related Questions