GIS-Jonathan
GIS-Jonathan

Reputation: 4637

How to delete duplicate rows from an Oracle Database?

We have a table that has had the same data inserted into it twice by accident meaning most (but not all) rows appears twice in the table. Simply put, I'd like an SQL statement to delete one version of a row while keeping the other; I don't mind which version is deleted as they're identical.

Table structure is something like:

FID, unique_ID, COL3, COL4....

Unique_ID is the primary key, meaning each one appears only once. FID is a key that is unique to each feature, so if it appears more than once then the duplicates should be deleted.

To select features that have duplicates would be:

select count(*) from TABLE GROUP by FID

Unfortunately I can't figure out how to go from that to a SQL delete statement that will delete extraneous rows leaving only one of each.

This sort of question has been asked before, and I've tried the create table with distinct, but how do I get all columns without naming them? This only gets the single column FID and itemising all the columns to keep gives an: ORA-00936: missing expression

CREATE TABLE secondtable NOLOGGING as select distinct FID from TABLE

Upvotes: 8

Views: 32189

Answers (6)

Christian Bongiorno
Christian Bongiorno

Reputation: 5648

I know this is an old question, I came up with a different solution entirely:

partition them by what makes them duplicate and then use that as a row number to throw away the extras:

delete from MY_TABLE where unique_id in (
    select unique_id
    from (
        select mt.*,
           row_number() over (
               partition by mt.RAW_VALUE,mt.END_DATE_TRUNC_UTC
               order by mt.END_DATE_TRUNC_UTC
           ) rn
        FROM
        mytable mt
    ) where rn > 1
);

So, in my case above, the fields that defined data as duplicate were the RAW_VALUE and the END_DATE_TRUNC_UTC This gave duplicate rows unique row numbers within their group. Then, I could just throw away everything except rn = 1

Upvotes: 1

GongchuangSu
GongchuangSu

Reputation: 29

You can try this.

delete from tablename a
where a.logid, a.pointid, a.routeid) in (select logid, pointid, routeid from tablename 
group by logid, pointid, routeid having count(*) > 1)
and rowid not in (select min(rowid) from tablename
group by logid, pointid, routeid having count(*) > 1)

Upvotes: 1

Shahriar Hasan Sayeed
Shahriar Hasan Sayeed

Reputation: 5935

Try with this.

DELETE FROM firsttable WHERE unique_ID NOT IN 
(SELECT MAX(unique_ID) FROM firsttable GROUP BY FID)

EDIT: One explanation:

SELECT MAX(unique_ID) FROM firsttable GROUP BY FID;

This sql statement will pick each maximum unique_ID row from each duplicate rows group. And delete statement will keep these maximum unique_ID rows and delete other rows of each duplicate group.

Upvotes: 0

bitoshi.n
bitoshi.n

Reputation: 2318

Try this

DELETE FROM table_name A WHERE ROWID > (
SELECT min(rowid) FROM table_name B
WHERE A.FID = B.FID)

Upvotes: 4

Justin Cave
Justin Cave

Reputation: 231651

If you don't care which row is retained

DELETE FROM your_table_name a
 WHERE EXISTS( SELECT 1
                 FROM your_table_name b
                WHERE a.fid = b.fid
                  AND a.unique_id < b.unique_id )

Once that's done, you'll want to add a constraint to the table that ensures that FID is unique.

Upvotes: 10

hogni89
hogni89

Reputation: 1754

A suggestion

DELETE FROM x WHERE ROWID IN
(WITH y AS (SELECT xCOL, MIN(ROWID) FROM x GROUP BY xCOL HAVING COUNT(xCOL) > 1)
SELCT a.ROWID FROM x, y WHERE x.XCOL=y.XCOL and x.ROWIDy.ROWID)

Upvotes: 1

Related Questions