anandh04
anandh04

Reputation: 86

Repeating rows using 'connect by' is running for long time

I want to insert a number of rows based on the start and end date in a parent table. I tried the following query but it is running for an hour.

INSERT INTO CHILD_TABLE
  with num as (
    select level as rnk 
     from dual 
  connect by level<=300
  ), 
  Select Data, start_date,
         end_date,rnk
    From Paratent_table
    Join num 
      ON (num.rnk <= end_date-start_date)

The parent table has more than a million rows.

Upvotes: 1

Views: 117

Answers (1)

Jon Heller
Jon Heller

Reputation: 36922

With a CARDINALITY hint, direct-path inserts, parallelism, and removing the common table expression, a generic version of your problem can be improved from 6 minutes to 22 seconds. There may be some other factor involved that explains why your version runs in 60 minutes and mine runs in 6 minutes. If that's still an issue after making these changes, use SQL monitoring to find out which operation and wait event is the problem.

Sample Schema

--drop table child_table;
--drop table parent_table;

create table parent_table(data varchar2(100), start_date date, end_date date);
create table child_table (data varchar2(100), start_date date, end_date date, rank number);

--Insert 1 million rows, gather statistics.
begin
    for i in 1 .. 10 loop
        insert into parent_table
        select level, sysdate,sysdate+50
        from dual
          connect by level <= 100000;
    end loop;
end;
/
begin
    dbms_stats.gather_table_stats(user, 'parent_table');
end;
/

Original version - runs in 6-10 minutes on my old desktop

insert into child_table
with num as
(
    select level as rnk 
    from dual 
    connect by level<=300
)
select data, start_date, end_date, rnk
from parent_table
join num 
    on num.rnk <= end_date-start_date;

commit;
truncate table child_table;

New version - runs in 22 seconds

insert /*+ append parallel */ into child_table
select data, start_date, end_date, rnk
from parent_table
join
(
    select /*+ cardinality(300) */ level as rnk 
    from dual 
    connect by level<=300
) num
    on num.rnk <= end_date-start_date;

Description of Changes

The CARDINALITY hint informs the optimizer that the inline view returns 300 rows, whereas the original estimate is 1 row. The improved cardinality estimate changes the plan from a NESTED LOOP to a MERGE JOIN.

Direct-path inserts avoid most REDO generation. Although with the simple sample table, the INSERT part of the statement is expensive anyway. It's possible that your real-world example spends a huge amount of time in INSERT if there are many indexes or foreign keys that need to be validated. Note that without REDO the table changes are not automatically backed up.

Parallelism makes use of multiple resources and can be a real game-changer for performance. Parallelism does put more strain on a system and can be very "unfair" to other processes. And it requires enterprise edition, a sane configuration, etc.

Common table expressions (CTE) are meant for repetitive query blocks. CTEs sometimes cause performance issues, but not in this case. It's more a matter of style; inline views are easier to debug than CTEs but it's difficult to explain that here.

Finally, if none of the above work, you need to increase the granularity of your profiling. Most tuning methods only focus on which SQL statements are slow and then guess as which operation within that SQL statement is slow. There's no need to guess, Real-Time SQL Monitoring will tell you exactly how long each operation takes and what it's waiting for. Find the SQL_ID and run a statement like this: select dbms_sqltune.report_sql_monitor(sql_id => '13gdzd4w5fx4y', type => 'text') from dual;

Upvotes: 1

Related Questions