Reputation: 677
i am using sql server 2005. i have a table contains duplicated rows. how can i eliminated those duplicate rows in that table? for e.g., the table may contain 3 similar rows in which i want to delete 2 rows and keep the original
Upvotes: 1
Views: 546
Reputation: 2470
For understanding purpose, lets take a simple table Employee with below schema
EmployeeId - int
EmployeeName varchar(50)
Age int
Lets populate with duplicate values. Please note primary key is not duplicated in this case
INSERT INTO Employee(EmployeeId,EmployeeName,Age) VALUES (1,'Mark',20)
INSERT INTO Employee(EmployeeId,EmployeeName,Age) VALUES (2,'Tom',22)
INSERT INTO Employee(EmployeeId,EmployeeName,Age) VALUES (3,'Sam',24)
INSERT INTO Employee(EmployeeId,EmployeeName,Age) VALUES (4,'Mark',20)
INSERT INTO Employee(EmployeeId,EmployeeName,Age) VALUES (5,'Tom',22)
INSERT INTO Employee(EmployeeId,EmployeeName,Age) VALUES (6,'Tom',22)
GO
we can make use of CTE in finding the duplicate rows. Gather duplicate rows by using Group by/Count statement. Once Duplicate rows are Identified, we select those rows from the main table using join condition. Now rank those Rows and delete all the rows apart from the rows with rank 1. I find this a lot more elegant.
WITH TotalDuplicates(EmployeeName,Age,Total) AS
(
SELECT EmployeeName,Age,COUNT(employeeId) AS Total FROM Employee
GROUP BY EmployeeName,Age
HAVING COUNT(employeeId) > 1
)
,DistinctRows(EmployeeId,EmployeeName,Age) AS
(
SELECT E.EmployeeId,E.EmployeeName,E.Age FROM Employee AS E
INNER JOIN TotalDuplicates AS D
ON (E.EmployeeName = D.EmployeeName AND E.Age = D.Age)
)
,OrderedDuplicateTables(EmployeeId,EmployeeName,Age,Ranking) AS
(
SELECT
EmployeeId,
EmployeeName,
Age,
RANK() OVER (PARTITION BY EmployeeName, Age ORDER BY EmployeeId DESC)
FROM DistinctRows
)
DELETE FROM Employee
WHERE EmployeeId IN (SELECT EmployeeId FROM OrderedDuplicateTables WHERE Ranking > 1)
Upvotes: 0
Reputation: 11309
First you can copy duplicate record into another table like as following way
Select fieldnames into #temp from table1 group by fieldnames having count (*) > 1
then remove that record from original table
delete from table1 where fieldname in (select fieldnames from #temp)
and finally copy record from temporary table to original source table.
insert table1 select * from #temp
through above mentioned steps you can eliminate duplicate record from table.
Upvotes: 0
Reputation: 333
If you just want to eliminate duplicate records from your result set, you can use the DISTINCT command:
SELECT DISTINCTI field1, field2 FROM...
If you want to delete those duplicate records, you can use COUNT to detect which records have more than one instance, and then deleting them with a subquery
Upvotes: 1