Reputation: 1878
We have some useless historical data in a database which sums upto 190 million (19 crores) rows in database contributing to 33-GB . Now I got a task to delete these much rows in one go and if in any case something breaks, I should be able to rollback the transaction.
I will select them based on some flag like deleted ='1' which from my estimation counts to 190 million out of 200 million. So first I have to do a select operation and then delete those id's.
As mentioned in this article, it is taking 4 hours to delete 1.5 million records, which count is far less than my case and I am wondering if I proceed with single deleted approach how much time it would take to delete 190 million records.
Should I use Spring-Batch for selecting id's of rows and then delete them batch by batch or issue a single statement by passing id's in IN
clause.
What would be a better approach please suggest.
Upvotes: 2
Views: 1939
Reputation: 2496
Maybe what you said is usual and normal performance for Java, but at my notebook deleting of 1M records requires about a minute - without Java, of course.
If you wish to do it good, I'd say you should use partitions. First of all, convert the plain table(s) into the partitioned one(s) with all data into one (current) partition. Then, prepare "historical" partitions and move unnecessary data into them. And after that you'll be ready to do anything. You'll can to move it offline (but restore when needed), you'll be able to exclude this data in seconds using EXCHANGE PARTITION and so on.
Upvotes: 0
Reputation: 1973
If doing it in Java is not mandatory, I'd create a PL/SQL procedure, open a cursor and use DELETE ... WHERE CURRENT OF. Maybe it's not super fast, but it's secure because you will have no rollback segment problems. Using a normal DELETE even without transaction is an atomic operation that must be rolled back if something fails.
Upvotes: 0
Reputation: 463
you can do copying required data from historical table to a new table and drop the old table entirely and rename the new table to old table name later -- as said by Raj in above post. this is best way to do.
and also you can use nologging and parallel options to speed up for example :
create table History_new parallel 4 nologging as select /*+parallel(source 4) */ * from History where col1 = 1 and ... ;
Upvotes: 1
Reputation: 9886
Why not moving the required data from historical table to a new table and dropping the old table entirely? You might rename the new table to old table name later on.
Upvotes: 1