Annie
Annie

Reputation: 131

Retrieving duplicate and original rows from a table using sql query

Say I have a student table with the following fields - student id, student name, age, gender, marks, class.Assume that due to some error, there are multiple entries corresponding to each student. My requirement is to identify the duplicate rows in the table and the filter criterion is the student name and the class.But in the query result, in addition to identifying the duplicate records, I also need to find the original student detail which got duplicated. Is there any method to do this. I went through this answer: SQL: How to find duplicates based on two fields?. But here it only specifies how to find the duplicate rows and not a means to identify the actual row that was duplicated. Kindly throw some light on the possible solution. Thanks.

Upvotes: 0

Views: 1581

Answers (1)

amcdermott
amcdermott

Reputation: 1585

First of all: if the columns you've listed are all in the same table, it looks like your database structure could use some normalization.

In terms of your question: I'm assuming your StudentID field is a database generated, primary key and so has not been duplicated. (If this is not the case, I think you have bigger problems than just duplicates).

I'm also assuming the duplicate row has a higher value for StudentID than the original row.

I think the following should work (Note: I haven't created a table to verify this so it might not be perfect straight away. If it doesn't it should be fairly close)

select dup.StudentID as DuplicateStudentID
       dup.StudentName, dup.Age, dup.Gender, dup.Marks, dup.Class,
       orig.StudentID as OriginalStudentId
  from StudentTable dup
 inner join (   
                -- Find first student record for each unique combination
                select Min(StudentId) as StudentID, StudentName, Age, Gender, Marks, Class
                  from StudentTable t
                 group by StudentName, Age, Gender, Marks, Class
            ) orig on dup.StudentName = orig.StudenName
                  and dup.Age = orig.Age
                  and dup.Gender = orig.Gender
                  and dup.Marks = orig.Marks
                  and dup.Class = orig.Class
                  and dup.StudentID > orig.StudentID -- Don't identify the original record as a duplicate

Upvotes: 1

Related Questions