Reputation: 239

which delete statement is better for deleting millions of rows

I have table which contains millions of rows. I want to delete all the data which is over a week old based on the value of column last_updated.

so here are my two queries,

Approach 1:

Delete from A where to_date(last_updated,''yyyy-mm-dd'')< sysdate-7;

Approach 2:

l_lastupdated varchar2(255) := to_char(sysdate-nvl(p_days,7),'YYYY-MM-DD');
insert into B(ID) select ID from A where LASTUPDATED < l_lastupdated;
delete from A where id in (select id from B);

which one is better considering performance, safety and locking?

Upvotes: 1

Answers (3)

evilive

Reputation: 996

Your stored dateformat seems suitable for proper sorting, so you could go the other way round and convert sysdate to string:

--this is false today    
select * from dual where '2019-06-05' < to_char(sysdate-7, 'YYYY-MM-DD'); 

--this is true today
select * from dual where '2019-05-05' < to_char(sysdate-7, 'YYYY-MM-DD');

So it would be:

Delete from A where last_updated < to_char(sysdate-7, ''yyyy-mm-dd'');

It has the benefit that your default index (if there is any) will be used.

It has the disadvantage on relying on the String/Varchar ordering which might be changed i.e. bei NLS changes (if i remember right), so in any case you should do a little testing before...

In the long term, you should of cource alter the colum to a proper date-datatype, but I guess that doesn't help you right now ;)

Upvotes: 1

Shweta

Reputation: 239

Assuming the delete removes a significant fraction of the data & millions of rows, approach three:

create table tmp
  Delete from A where to_date(last_updated,''yyyy-mm-dd'')< sysdate-7;

drop table a;

rename tmp to a;

https://asktom.oracle.com/pls/apex/f?p=100:11:0::::P11_QUESTION_ID:2345591157689

Obviously you'll need to copy over all the indexes, grants, etc. But online redefinition can help with this https://oracle-base.com/articles/11g/online-table-redefinition-enhancements-11gr1

When you get to 12.2, there's another simpler option: a filtered move.

This is an alter table move operation, with an extra clause stating which rows you want to keep:

create table t (
  c1 int
);

insert into t values ( 1 );
insert into t values ( 2 );
commit;

alter table t
  move including rows where c1 > 1;

select * from t;

C1   
    2

While you're waiting to upgrade to 12.2+ and if you don't want to use the create-as-select method for some reason then approach 1 is superior:

Both methods delete the same rows from A* => it's the same amount of work to do the delete
Option 1 has one statement; Option 2 has two statements; 2 > 1 => option 2 is more work

*Statement level consistency means you might get different results running the processes. Say another session tries to update an old row that your process will remove.

With just the delete, the update will be blocked until the delete finishes. At which point the row's gone, so the update does nothing.

Whereas if you do the insert first, the other session can update & commit the row before the insert completes. So the update "succeeds". But the delete will then remove it! Which can lead to some unhappy customers...

Upvotes: 1

Boneist

Reputation: 23588

If you are trying to delete most of the rows in the table, I would advise you go with a different approach, namely:

create <new table name> as
select *
from   <old table name>
where  <predicates for the data you want to keep>;

then

drop table <old table name>;

and finally you can rename the new table back to the old table.

You could always partition the new table (i.e. create the new table with a separate statement containing the partitioning clauses, and then have an insert as select into the new table from the old table).

That way, when you need to delete rows, it's a simple matter of dropping the relevant partition(s).

Upvotes: 0

which delete statement is better for deleting millions of rows

Answers (3)

Related Questions