Nora
Nora

Reputation: 55

Query Optimization - subselect in Left Join

I'm working on optimizing a sql query, and I found a particular line that appears to be killing my queries performance:

LEFT JOIN anothertable lastweek
AND lastweek.date>= (SELECT MAX(table.date)-7 max_date_lweek 
                                 FROM table table
                                 WHERE table.id= lastweek.id) 
AND lastweek.date< (SELECT MAX(table.date) max_date_lweek 
                                 FROM table table  
                                 WHERE table.id= lastweek.id) 

I'm working on a way of optimizing these lines, but I'm stumped. If anyone has any ideas, please let me know!

-----------------------------------------------------------------------------------------------------------
| Id   | Operation                         | Name            | Rows      | Bytes      | Cost   | Time     |
-----------------------------------------------------------------------------------------------------------
|    0 | SELECT STATEMENT                  |                 |   1908654 |  145057704 | 720461 | 00:00:29 |
|  * 1 |   HASH JOIN RIGHT OUTER           |                 |   1908654 |  145057704 | 720461 | 00:00:29 |
|    2 |    VIEW                           | VW_DCL_880D8DA3 |    427487 |    7694766 | 716616 | 00:00:28 |
|  * 3 |     HASH JOIN                     |                 |    427487 |   39328804 | 716616 | 00:00:28 |
|    4 |      VIEW                         | VW_SQ_2         |   7174144 |  193701888 | 278845 | 00:00:11 |
|    5 |       HASH GROUP BY               |                 |   7174144 |  294139904 | 278845 | 00:00:11 |
|    6 |        TABLE ACCESS STORAGE FULL  | TASK            | 170994691 | 7010782331 |  65987 | 00:00:03 |
|  * 7 |      HASH JOIN                    |                 |   8549735 |  555732775 | 429294 | 00:00:17 |
|    8 |       VIEW                        | VW_SQ_1         |   7174144 |  172179456 | 278845 | 00:00:11 |
|    9 |        HASH GROUP BY              |                 |   7174144 |  294139904 | 278845 | 00:00:11 |
|   10 |         TABLE ACCESS STORAGE FULL | TASK            | 170994691 | 7010782331 |  65987 | 00:00:03 |
|   11 |       TABLE ACCESS STORAGE FULL   | TASK            | 170994691 | 7010782331 |  65987 | 00:00:03 |
| * 12 |    TABLE ACCESS STORAGE FULL      | TASK            |   1908654 |  110701932 |   2520 | 00:00:01 |
-----------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
------------------------------------------
* 1 - access("SYS_ID"(+)="TASK"."PARENT")
* 3 - access("ITEM_2"="TASK_LWEEK"."SYS_ID")
* 3 - filter("TASK_LWEEK"."SNAPSHOT_DATE"<"MAX_DATE_LWEEK")
* 7 - access("ITEM_1"="TASK_LWEEK"."SYS_ID")
* 7 - filter("TASK_LWEEK"."SNAPSHOT_DATE">=INTERNAL_FUNCTION("MAX_DATE_LWEEK"))
* 12 - storage("TASK"."CLOSED_AT" IS NULL OR "TASK"."CLOSED_AT">=TRUNC(SYSDATE@!)-15)
* 12 - filter("TASK"."CLOSED_AT" IS NULL OR "TASK"."CLOSED_AT">=TRUNC(SYSDATE@!)-15)

Upvotes: 0

Views: 74

Answers (3)

Matthew McPeak
Matthew McPeak

Reputation: 17944

Looking at your explain plan, the only table being accessed is TASK. From that, I infer that the tables in your example: ANOTHERTABLE and TABLE are actually the same table and that, therefore, you are trying to get the last week of data that exists in that table for each id value.

If all that is true, it should be much faster to use an analytic function to get the max date value for each id and then limit based on that.

Here is an example of what I mean. Note I use "dte" instead of "date", to remove confusion with the reserved word "date".

LEFT JOIN ( SELECT lastweek.*, 
                   max(dte) OVER ( PARTITION BY id ) max_date 
            FROM   anothertable lastweek ) lastweek
ON 1=1  -- whatever other join conditions you have, seemingly omitted from your post
AND lastweek.dte >= lastweek.max_date - 7;

Again, this only works if I am correct in thinking that table and anothertable are actually the same table.

Upvotes: 0

Roberto Hernandez
Roberto Hernandez

Reputation: 8518

Well, you are not even showing the select. As I can see that the select is done over Exadata ( Table Access Storage Full ) , perhaps you need to ask yourself why do you need to make 4 access to the same table.

You access fourth times ( lines 6, 10, 11, 12 ) to the main table TASK with 170994691 rows ( based on estimation of the CBO ). I don't know whether the statistics are up-to-date or it is optimizing sampling kick in due to lack of good statistics.

A solution could be use WITH for generating intermediate results that you need several times in your outline query

with my_set as 
 (SELECT MAX(table.date)-7 max_date_lweek  , 
         max(table.date) as max_date, 
         id from FROM table ) 
select 
.......................
from ...
left join anothertable lastweek on ( ........ ) 
left join myset on ( anothertable.id = myset.id ) 
where 
      lastweek.date >= myset.max_date_lweek  
      and 
      lastweek.date < myset.max_date

Please, take in account that you did not provide the query, so I am guessing a lot of things.

Upvotes: 1

Atif
Atif

Reputation: 2210

Since complete information is not available I will suggest:

You are using the same query twice then why not use CTE such as

with CTE_example as (SELECT MAX(table.date), max_date_lweek, ID 
                                 FROM table table)

Upvotes: 0

Related Questions