Reputation: 4637
We have a table that has had the same data inserted into it twice by accident meaning most (but not all) rows appears twice in the table. Simply put, I'd like an SQL statement to delete one version of a row while keeping the other; I don't mind which version is deleted as they're identical.
Table structure is something like:
FID, unique_ID, COL3, COL4....
Unique_ID
is the primary key, meaning each one appears only once.
FID
is a key that is unique to each feature, so if it appears more than once then the duplicates should be deleted.
To select features that have duplicates would be:
select count(*) from TABLE GROUP by FID
Unfortunately I can't figure out how to go from that to a SQL delete statement that will delete extraneous rows leaving only one of each.
This sort of question has been asked before, and I've tried the create table with distinct, but how do I get all columns without naming them? This only gets the single column FID and itemising all the columns to keep gives an: ORA-00936: missing expression
CREATE TABLE secondtable NOLOGGING as select distinct FID from TABLE
Upvotes: 8
Views: 32189
Reputation: 5648
I know this is an old question, I came up with a different solution entirely:
partition them by what makes them duplicate
and then use that as a row number to throw away the extras:
delete from MY_TABLE where unique_id in (
select unique_id
from (
select mt.*,
row_number() over (
partition by mt.RAW_VALUE,mt.END_DATE_TRUNC_UTC
order by mt.END_DATE_TRUNC_UTC
) rn
FROM
mytable mt
) where rn > 1
);
So, in my case above, the fields that defined data as duplicate were the RAW_VALUE
and the END_DATE_TRUNC_UTC
This gave duplicate rows unique row numbers within their group. Then, I could just throw away everything except rn = 1
Upvotes: 1
Reputation: 29
You can try this.
delete from tablename a
where a.logid, a.pointid, a.routeid) in (select logid, pointid, routeid from tablename
group by logid, pointid, routeid having count(*) > 1)
and rowid not in (select min(rowid) from tablename
group by logid, pointid, routeid having count(*) > 1)
Upvotes: 1
Reputation: 5935
Try with this.
DELETE FROM firsttable WHERE unique_ID NOT IN
(SELECT MAX(unique_ID) FROM firsttable GROUP BY FID)
EDIT: One explanation:
SELECT MAX(unique_ID) FROM firsttable GROUP BY FID;
This sql statement will pick each maximum unique_ID row from each duplicate rows group. And delete statement will keep these maximum unique_ID rows and delete other rows of each duplicate group.
Upvotes: 0
Reputation: 2318
Try this
DELETE FROM table_name A WHERE ROWID > (
SELECT min(rowid) FROM table_name B
WHERE A.FID = B.FID)
Upvotes: 4
Reputation: 231651
If you don't care which row is retained
DELETE FROM your_table_name a
WHERE EXISTS( SELECT 1
FROM your_table_name b
WHERE a.fid = b.fid
AND a.unique_id < b.unique_id )
Once that's done, you'll want to add a constraint to the table that ensures that FID
is unique.
Upvotes: 10
Reputation: 1754
A suggestion
DELETE FROM x WHERE ROWID IN
(WITH y AS (SELECT xCOL, MIN(ROWID) FROM x GROUP BY xCOL HAVING COUNT(xCOL) > 1)
SELCT a.ROWID FROM x, y WHERE x.XCOL=y.XCOL and x.ROWIDy.ROWID)
Upvotes: 1