How to delete a duplicate record without using primary key

Question

I went for an interview today where they give me technical test on sql. One of them was how to delete duplicate records without a primary key.

For one I can't imagine a table without a primary key. Yes I have read the existing threads on this. Say this happened and needed to be fixed Now. Couldn't I just add to the end of the table a automatically incrementing id then use that to delete the duplicate record?

Can anyone think of a reason why that won't work? I tried it on a simple database I created and I can't see any problems

ydaetskcoR · Accepted Answer

You've got a couple of options here.

If they don't mind you dropping the table you could SELECT DISTINCT * from the table in question and then INSERT this into a new table, DROPping the old table as you go. This obviously won't be usable in a Production database but can be useful for where someone has mucked up a routine that's populating a data warehouse for example.

Alternatively you could effectively create a temporary index by using the row number as per this answer. That answer shows you how to use the built in row_number() function in SQL server but could be replicated in other RDBMS' (not sure which but MySQL certainly) by declaring a variable called @row_num or equivalent and then using it in your SELECT statement as:

SET @row_num=0;
SELECT @row_num:=@row_num+1 AS row_num, [REMAINING COLUMNS GO HERE]

How to delete a duplicate record without using primary key

Answers (2)

Related Questions