Reputation: 86

how to improve the execution time of this query?

I have a dimension which has around 11.000.000 records, 4 tables are joined to each other in ETL process to fill my dimension(dimtst).

insert into dimtst
select .... from  tdpst left outer join fhist 
left outer join dp2act
left outer join dp2cust

above query lasts alot(tdpst has 11 records and above query lasts around 15 minutes), Hence I created a temp table which joined tdpst and fhist and stored the result in tmpDpsInf(another temporary table which I created).

then `tmpDpsInf` will join with `C` and `D` in another query which has adequate execution time.

create table TMPFHIST
(
  abrnchcod NUMBER(4) not null,
  tbdptype  NUMBER(3) not null,
  cfcifno   NUMBER(8) not null,
  tdserial  NUMBER(3) not null,
  aistate   NUMBER(1)
)

--------------
create table TMPDPSINF
(
  abrnchcod     NUMBER(4) not null,
  tbdptype      NUMBER(3) not null,
  cfcifno       NUMBER(8) not null,
  tdserial      NUMBER(3) not null,
  ausrcode      NUMBER(4) not null,
  tdtitle       VARCHAR2(82),
  tdopndat      DATE,
  tdrnwdat      DATE,
  tdclsdat      DATE,
  acurrcode     CHAR(3) not null,
  abu_abrnchcod NUMBER(4) not null,
  aistate       NUMBER(1)
)
create table tdpst  
(
  abrnchcod     NUMBER(4) not null,
  tbdptype      NUMBER(3) not null,
  cfcifno       NUMBER(8) not null,
  tdserial      NUMBER(3) not null,
  ausrcode      NUMBER(4) not null,
  tdtitle       VARCHAR2(82),
  tdopndat      DATE,
  tdrnwdat      DATE,
  tdclsdat      DATE,
  acurrcode     CHAR(3) not null,
  abu_abrnchcod NUMBER(4) not null
)
--------

tmpDpsInf fills with below query:
    insert into tmpDpsInf
      select /*+parallel(12)*/
       d.ABRNCHCOD,
       d.TBDPTYPE,
       d.CFCIFNO,
       d.TDSERIAL,
       d.AUSRCODE AUSRCODE,
       d.tdtitle,
       trunc(d.tdopndat) TDOPNDAT,
       nvl(d.tdrnwdat, d.tdopndat),
       nvl(d.tdclsdat, to_date('1500/01/01', 'yyyy/mm/dd')) tdclsdat,
       d.acurrcode acurrcode,
       d.ABU_ABRNCHCOD ABU_ABRNCHCOD,
       tmp.aistate
        from tdpst  d
        left outer join fhist tmp
          on d.ABRNCHCOD = tmp.ABRNCHCOD
         and d.TBDPTYPE =  tmp.TBDPTYPE
         and d.CFCIFNO =   tmp.CFCIFNO
         and d.TDSERIAL =  tmp.TDSERIAL
       where d.TDOPNDAT <= currdate

1 Plan hash value: 3720425100 2
3 -------------------------------------------------------------------------------------------------------------------------- 4 | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | TQ |IN-OUT| PQ Distrib | 5 -------------------------------------------------------------------------------------------------------------------------- 6 | 0 | SELECT STATEMENT | | 12M| 1994M| 4248 (3)| 00:01:17 | | | | 7 | 1 | PX COORDINATOR | | | | | | | | | 8 | 2 | PX SEND QC (RANDOM) | :TQ10002 | 12M| 1994M| 4248 (3)| 00:01:17 | Q1,02 | P->S | QC (RAND) | 9 |* 3 | HASH JOIN RIGHT OUTER BUFFERED| | 12M| 1994M| 4248 (3)| 00:01:17 | Q1,02 | PCWP | | 10 | 4 | PX RECEIVE | | 3730K| 67M| 178 (3)| 00:00:04 | Q1,02 | PCWP | | 11 | 5 | PX SEND HASH | :TQ10000 | 3730K| 67M| 178 (3)| 00:00:04 | Q1,00 | P->P | HASH | 12 | 6 | PX BLOCK ITERATOR | | 3730K| 67M| 178 (3)| 00:00:04 | Q1,00 | PCWC | | 13 | 7 | TABLE ACCESS FULL | TMPFHIST | 3730K| 67M| 178 (3)| 00:00:04 | Q1,00 | PCWP | | 14 | 8 | PX RECEIVE | | 12M| 1774M| 4059 (2)| 00:01:14 | Q1,02 | PCWP | | 15 | 9 | PX SEND HASH | :TQ10001 | 12M| 1774M| 4059 (2)| 00:01:14 | Q1,01 | P->P | HASH | 16 | 10 | PX BLOCK ITERATOR | | 12M| 1774M| 4059 (2)| 00:01:14 | Q1,01 | PCWC | | 17 |* 11 | TABLE ACCESS FULL | TDPST | 12M| 1774M| 4059 (2)| 00:01:14 | Q1,01 | PCWP | | 18 -------------------------------------------------------------------------------------------------------------------------- 19
20 Predicate Information (identified by operation id): 21 --------------------------------------------------- 22
23 3 - access("D"."TDSERIAL"="TMP"."TDSERIAL"(+) AND "D"."CFCIFNO"="TMP"."CFCIFNO"(+) AND 24 "D"."TBDPTYPE"="TMP"."TBDPTYPE"(+) AND "D"."ABRNCHCOD"="TMP"."ABRNCHCOD"(+)) 25 11 - filter("D"."TDOPNDAT"<=TO_DATE('20130601','yyyy/mm/dd')) 26
27 Note 28 ----- 29 - dynamic sampling used for this statement (level=4) 30 - Degree of Parallelism is 10 because of hint

I executed above query with /+leading (d,tmp)/ hint.the plan is shown below:

Plan hash value: 1033900074

| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | TQ |IN-OUT| PQ Distrib |

| 0 | SELECT STATEMENT | | 12M| 1994M| 4255 (3)| 00:01:17 | | | | | 1 | PX COORDINATOR | | | | | | | | | | 2 | PX SEND QC (RANDOM) | :TQ10002 | 12M| 1994M| 4255 (3)| 00:01:17 | Q1,02 | P->S | QC (RAND) | |* 3 | HASH JOIN OUTER BUFFERED| | 12M| 1994M| 4255 (3)| 00:01:17 | Q1,02 | PCWP | | | 4 | PX RECEIVE | | 12M| 1774M| 4059 (2)| 00:01:14 | Q1,02 | PCWP | | | 5 | PX SEND HASH | :TQ10000 | 12M| 1774M| 4059 (2)| 00:01:14 | Q1,00 | P->P | HASH | | 6 | PX BLOCK ITERATOR | | 12M| 1774M| 4059 (2)| 00:01:14 | Q1,00 | PCWC | | |* 7 | TABLE ACCESS FULL | TDPST | 12M| 1774M| 4059 (2)| 00:01:14 | Q1,00 | PCWP | | | 8 | PX RECEIVE | | 3730K| 67M| 178 (3)| 00:00:04 | Q1,02 | PCWP | | | 9 | PX SEND HASH | :TQ10001 | 3730K| 67M| 178 (3)| 00:00:04 | Q1,01 | P->P | HASH | | 10 | PX BLOCK ITERATOR | | 3730K| 67M| 178 (3)| 00:00:04 | Q1,01 | PCWC | |

| 11 | TABLE ACCESS FULL | TMPFHIST | 3730K| 67M| 178 (3)| 00:00:04 | Q1,01 | PCWP | |

Predicate Information (identified by operation id):

3 - access("D"."TDSERIAL"="TMP"."TDSERIAL"(+) AND "D"."CFCIFNO"="TMP"."CFCIFNO"(+) AND "D"."TBDPTYPE"="TMP"."TBDPTYPE"(+) AND "D"."ABRNCHCOD"="TMP"."ABRNCHCOD"(+)) 7 - filter("D"."TDOPNDAT"<=TO_DATE('20130601','yyyy/mm/dd'))

Note

dynamic sampling used for this statement (level=4)
Degree of Parallelism is 10 because of hint

I created one index on d.TDOPNDAT and one index on (tmp.ABRNCHCOD, tmp.TBDPTYPE, tmp.CFCIFNO,tmp.TDSERIAL)

.In addition to,Optimizer didn't use any index, I force optimizer to use created indexes,but query cost increases Exponentially!! after doing all of mentioned works, query time is still high!

dos any one have any suggestion to reduce this query time ? Thanks

Upvotes: 0

Answers (2)

James

Reputation: 3411

Creating the intermediate table shouldn't be necessary, as, given the correct information, the query optimiser should be able to make a correct decision on the fastest way to do this in one step without requiring you to work out the intermediate step. Likewise with indexes, generally the query optimiser will get the decision right on whether or not to use them - as long as it has the right inputs (in the form of stats / constraints etc).

Indexes are not always good - they're powerful where a small subset of the data in the table needs to be found, but in cases where you're reading the whole table they won't help you much.

As a starting point, ask Oracle what plan it is using the execute your long running query. This can be done with:

explain plan for 
select .... from  tdpst left outer join fhist 
left outer join dp2act
left outer join dp2cust;

Then,

select * from table(dbms_xplan.display);

NOTE - I don't have a database on this laptop (it is the weekend), so there may be some typos in the commands above, see the docs if they don't work as expected.

Look at the results and consider: How is it joining the tables - is it using a hash, a merge join or nested loops? Is it joining the table in the right order.

My guess here is that it will be making an incorrect decision on the join order or join type. Working on that assumption, the next thing I would check would be the cardinality estimates. Run the query again with a /*+ GATHER_PLAN_STATISTICS */ hint (you'll need to actually run it with this hint in - not just explain plan it again) and check the difference between the estimated and actual row numbers using:

SELECT * FROM table(DBMS_XPLAN.DISPLAY_CURSOR(FORMAT=>'ALLSTATS LAST'));

Check for big differences between the estimated rows and actual rows. A big difference mean a problem in the statistics that is leading the optimizer astray. From this point you can consider.

Gathering aditional stats on the tables, for example if one of the tables has a skewed column that is in the join (see DBMS_STATS)
Gathering extended stats if you are joining on two or more correlated columns

Then repeat and see if you get a performance improvement

Good luck..

Upvotes: 1

Andrew Jansen

Reputation: 196

First off, its typically best practice to let the engine handle index selection. If an index isnt being used, it's usually because criteria for the join doesn't allow the index to be utilised correctly. Thats why your execution costs increase.

Temp tables are a good alternative if you have minimal IO and your temp database on a SSD drive. Environment is a factor here. If your data selection is large, consider pagination.

Otherwise I would recommend an indexed view. Your columns will be specified with your join and allowing index to be pre-computed for optimal results. I think Oracle calls them materialized views. It should also force computation of the join if you index fields from each of the tables.

If my answer is wrong, feel free to correct me. Most of my experience is from MSSQL. My Oracle experience is a little limited