Reputation: 169
An empty table T1 where rows have to be inserted by selecting rows from another table T2 in ORACLE.
Like,
INSERT INTO T1
SELECT * FROM T2;
The issue is table T2 has about 10 million of rows. This simple SELECT statement seems to execute around 25-30 secs individually. But when it inserts into T1, it takes 20-30 mins to complete.
Why the above statement is taking long time to execute and what is the best approach or how to improve upon to insert data to table T1 selecting from table T2?
Upvotes: 0
Views: 653
Reputation: 91
Adding an APPEND
hint may enable a direct path insert, which can avoid generating extra REDO data used for recovery:
INSERT /*+ append */ INTO T1
SELECT * FROM T2;
Adding parallelism can further improve performance:
ALTER SESSION ENABLE PARALLEL DML;
INSERT /*+ parallel append */ INTO T1
SELECT * FROM T2;
Those two features could shrink the run time from minutes to seconds but there are a lot of caveats you need to understand. Direct-path writes lock the table, and are not recoverable; if the data is important you may not want to wait for the next full backup. Parallel queries work harder, not smarter, and may steal resources from more important jobs. Finding the optimal degree of parallelism is tricky, and direct-path inserts have many limitations, like triggers and some kinds of referential integrity constraints.
With the right hardware, system configuration, and code, you can realistically improve performance by 100x. But if you're new to these features, prepare to spend hours learning about them.
Upvotes: 2
Reputation: 8945
For one thing, the "apparent" execution time of a simple SELECT
query is a bit misleading: the database engine figures out how to do the query then returns only the first "chunk" of information to you. (As you then move through the dataset, additional "chunks" are transparently supplied as needed.) But when you specify INSERT
, now the database has no choice but to actually go through all those millions of rows.
There are often specialized tools that are specifically intended for "bulk" data operations such as this one. These might be significantly faster.
Another standard practice is to temporarily disable indexes. This avoids the overhead of updating the indexes for every record: the index will be completely rebuilt when you turn it back on. (The "bulk operations" tools aforementioned will usually do things like that automagically.)
Upvotes: 2