Reputation: 263
I have a table in Oracle, which gets populated daily with approx. 350k records. I've created a procedure to keep only 2 dates in the table, the moment when there are records with a third distinct date inserted, it will delete the records with the Minimum date.
The solution below works, but it's taking too long to execute, since there are 1M records in the table:
CREATE OR REPLACE PROCEDURE DELETE_PREV_DT
AS
nCount NUMBER;
tablename varchar2(50);
BEGIN
FOR aRow IN (SELECT *
FROM TTTAAAA
)
LOOP
tablename := 'TTTAAAA';
EXECUTE IMMEDIATE 'SELECT COUNT(DISTINCT DATE_ACCUMULATED) FROM ' || tablename
INTO nCount;
IF nCount > 2 THEN
EXECUTE IMMEDIATE 'DELETE FROM ' || tablename ||
' WHERE DATE_ACCUMULATED = (SELECT MIN(DATE_ACCUMULATED) ' ||
'FROM ' || tablename || ')';
END IF;
END LOOP;
END;
/
Can someone provide a modification so this procedure can run faster?
Upvotes: 0
Views: 72
Reputation: 1203
@Littlefoot's answer is excellent.
I am providing an additional perspective. Since the requirement is to delete rows based on dates, this batch job is suitable for scheduling to be run once a day.
The table is large (~1M rows). So, there will be daily insertions and deletions. In practice, this results in table and index fragmentation that causes performance degradation. Typically, this is handled by rebuilding the indexes on a routine basis during a db maintenance window.
Given these practical considerations, an alternative approach might be to truncate and re-populate the table or re-create the table.
In Oracle, you can use the CTAS pattern to create a table from the result of a query.
CREATE TABLE dest_tab AS SELECT * FROM source_tab;
You can also do INSERT using the APPEND hint, if you don't want to dynamically create tables.
INSERT /*+ APPEND */ INTO dest_tab SELECT * FROM source_tab;
For repopulating dest_tab, you can do,
TRUNCATE TABLE dest_tab;
INSERT /*+ APPEND */ INTO dest_tab SELECT * FROM source_tab;
The advantage of the CTAS or INSERT /*+ APPEND */ is that you can do PARALLEL reads and PARALLEL writes by adding those hints/clauses to improve the performance of the batch job by using available resources -- cores, temp space and memory.
The SELECT query that retains the data of interest would be:
SELECT
x.c1,
x.c2
FROM (SELECT /*+ PARALLEL(a, 4) */
a.c1,
a.c2,
dense_rank() over (order by a.datum desc) rn
FROM test a) x
where x.rn <= 2;
You can now use CTAS:
CREATE TABLE xx AS
SELECT
x.c1,
x.c2
FROM (SELECT /*+ PARALLEL(a, 4) */
a.c1,
a.c2,
dense_rank() over (order by a.datum desc) rn
FROM test a) x
where x.rn <= 2;
or if you prefer the TRUNCATE / INSERT /*+ APPEND */ pattern,
TRUNCATE TABLE xx;
INSERT /*+ APPEND PARALLEL */ INTO xx
SELECT
x.c1,
x.c2
FROM (SELECT /*+ PARALLEL(a, 4) */
a.c1,
a.c2,
dense_rank() over (order by a.datum desc) rn
FROM test a) x
where x.rn <= 2;
COMMIT;
Note the parallel reads and writes in the hints.
Once the data is there in the temporary xx table, you can then drop and recreate the original table, or do a TRUNCATE and INSERT /*+ APPEND */ again into it. Since this is a bulk insert of ~1M records, you can significantly improve performance by dropping or disabling all indexes and constraints on the destination table and rebuilding them after the insert.
Don't forget to enable parallel data manipulation language for the session using:
ALTER SESSION ENABLE PARALLEL DML;
I have used this technique to populate data marts with 100M+ rows. These parallel table build techniques can cut your large table maintenance from several hours to minutes, if done properly.
There is yet another approach that can also be used. This uses Oracle Partitioned Tables. Partition the table by day of week. Then once a day during the db maintenance window, truncate partitions that are older than 2 days and rebuild all indexes. This means no change to inserts. There is no querying involved for deleting the unnecessary records.
Upvotes: 3
Reputation: 142710
Row-by-row promises to be slow-by-slow, along with context switching because of dynamic SQL.
How about such an approach? Sort dates, remove ones that aren't in the top 2.
SQL> select * from test order by datum, id;
ID DATUM
---------- ----------
1 21.08.2020
2 21.08.2020
3 21.08.2020
4 22.08.2020
5 22.08.2020
6 23.08.2020
7 23.08.2020
8 24.08.2020
8 rows selected.
SQL> delete from test t
2 where t.datum in (select x.datum
3 from (select a.datum,
4 dense_rank() over (order by a.datum desc) rn
5 from test a
6 ) x
7 where x.rn > 2
8 );
5 rows deleted.
SQL> select * from test order by datum, id;
ID DATUM
---------- ----------
6 23.08.2020
7 23.08.2020
8 24.08.2020
SQL>
Upvotes: 3