Frank Ockenfuss
Frank Ockenfuss

Reputation: 2043

How to reuse sql with subquery factoring by database view

What is the best practice to convert the following sql statement using a subquery (with data as clause) to use it in a database view. AFAIK the with data as clause is not supported in database views (Edited: Oracle supports Common Table Expressions), but in my case the subquery factoring offers advantage for performance. If I create a database view using Common Table Expression, than this advantage is lost.

Please have a look at my example:

Description of query

a_table Millions of entries, by the select statement a few thousand are selected.

anchor_table For each entry in a_table exists a corresponding entry in anchor_table. By this table is determined at runtime exactly one row as anchor. See example below.

horizon_table For each selection exactly one entry is determined at runtime (all entries of a selection of a_table have the same horizon_id)

Please notice: This is a strongly simplified sql that works fine so far.

In reality more than 20 tables are joined together to get the results of data. The where clause is much more complex. Further columns of horizon_table and anchor_table are required to prepare my where condition and result list in the subquery, i.e. moving these tables to the main query is no solution.

with data as (
  select
  a_table.id,
  a_table.descr,
  horizon_table.offset,
  case
    when anchor_table.a_date = trunc(sysdate) then
    1
    else
    0
  end as anchor,
  row_number() over( 
  order by a_table.a_position_field) as position
  from a_table
  join anchor_table on (anchor_table.id = a_table.anchor_id)
  join horizon_table on (horizon_table.id = a_table.horizon_id)
  where a_table.a_value between 1 and 10000
)
select * 
from data d
where d.position between ( 
    select d1.position - d.offset 
    from data d1 
    where d1.anchor = 1) 
  and ( 
    select d2.position + d.offset 
    from data d2 
    where d2.anchor = 1) 

example of with data as select:

id   descr   offset  anchor   position
1    bla     3       0        1
2    blab    3       0        2
5    dfkdj   3       0        3
4    dld     3       0        4
6    oeroe   3       1        5
3    blab    3       0        6
9    dfkdj   3       0        7
14   dld     3       0        8
54   oeroe   3       0        9
...

result of select * from data

id   descr   offset  anchor   position
2    blab    3       0        2
5    dfkdj   3       0        3
4    dld     3       0        4
6    oeroe   3       1        5
3    blab    3       0        6
9    dfkdj   3       0        7
14   dld     3       0        8

I.E. the result is the anchor row and the tree rows above and below.

How can I achieve the same within a database view?

My attempt failed as I expected by performance issues:

Create a view data of with data as select above Use this view as above

select * 
    from data d
    where d.position between ( 
        select d1.position - d.offset 
        from data d1 
        where d1.anchor = 1) 
      and ( 
        select d2.position + d.offset 
        from data d2 
        where d2.anchor = 1)

Thank you for any advice :-)

Amendment

If I create a view as recommended in first comment, than I get the same performance issue. Oracle does not use the subquery to restrict the results.

Here are the execution plans of my production queries (please click at the images)

a) SQL enter image description here

b) View enter image description here

Here are the execution plans of my test cases

-- Create Testdata table with ~ 1,000,000 entries
insert into a_table
  (id, descr, a_position_field, anchor_id, horizon_id, a_value)
  select level, 'data' || level, mod(level, 10), level, 1, level
    from dual
  connect by level <= 999999;

insert into anchor_table
  (id, a_date)
  select level, trunc(sysdate) - 500000 + level
    from dual
  connect by level <= 999999;

insert into horizon_table (id, offset) values (1, 50);

commit;

-- Create view
create or replace view testdata_vw as
  with data as
   (select a_table.id,
           a_table.descr,
           a_table.a_value,
           horizon_table.offset,
           case
             when anchor_table.a_date = trunc(sysdate) then
              1
             else
              0
           end as anchor,
           row_number() over(order by a_table.a_position_field) as position
      from a_table
      join anchor_table
        on (anchor_table.id = a_table.anchor_id)
      join horizon_table
        on (horizon_table.id = a_table.horizon_id))
  select *
    from data d
   where d.position between
         (select d1.position - d.offset from data d1 where d1.anchor = 1) and
         (select d2.position + d.offset from data d2 where d2.anchor = 1);

-- Explain plan of subquery factoring select statement
explain plan for
  with data as
   (select a_table.id,
           a_table.descr,
           a_value,
           horizon_table.offset,
           case
             when anchor_table.a_date = trunc(sysdate) then
              1
             else
              0
           end as anchor,
           row_number() over(order by a_table.a_position_field) as position
      from a_table
      join anchor_table
        on (anchor_table.id = a_table.anchor_id)
      join horizon_table
        on (horizon_table.id = a_table.horizon_id)

     where a_table.a_value between 500000 - 500 and 500000 + 500)
  select *
    from data d
   where d.position between
         (select d1.position - d.offset from data d1 where d1.anchor = 1) and
         (select d2.position + d.offset from data d2 where d2.anchor = 1);

select plan_table_output
  from table(dbms_xplan.display('plan_table', null, null));

/*

Note: Size of SYS_TEMP_0FD9D6628_284C5768 ~ 1000 rows

Plan hash value: 1145408420

----------------------------------------------------------------------------------------------------------
| Id  | Operation                  | Name                        | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |                             |     1 |    62 |  1791   (2)| 00:00:31 |
|   1 |  TEMP TABLE TRANSFORMATION |                             |       |       |            |          |
|   2 |   LOAD AS SELECT           | SYS_TEMP_0FD9D6628_284C5768 |       |       |            |          |
|   3 |    WINDOW SORT             |                             |    57 |  6840 |  1785   (2)| 00:00:31 |
|*  4 |     HASH JOIN              |                             |    57 |  6840 |  1784   (2)| 00:00:31 |
|*  5 |      TABLE ACCESS FULL     | A_TABLE                     |    57 |  4104 |  1193   (2)| 00:00:21 |
|   6 |      MERGE JOIN CARTESIAN  |                             |  1189K|    54M|   586   (2)| 00:00:10 |
|   7 |       TABLE ACCESS FULL    | HORIZON_TABLE               |     1 |    26 |     3   (0)| 00:00:01 |
|   8 |       BUFFER SORT          |                             |  1189K|    24M|   583   (2)| 00:00:10 |
|   9 |        TABLE ACCESS FULL   | ANCHOR_TABLE                |  1189K|    24M|   583   (2)| 00:00:10 |
|* 10 |   FILTER                   |                             |       |       |            |          |
|  11 |    VIEW                    |                             |    57 |  3534 |     2   (0)| 00:00:01 |
|  12 |     TABLE ACCESS FULL      | SYS_TEMP_0FD9D6628_284C5768 |    57 |  4104 |     2   (0)| 00:00:01 |
|* 13 |    VIEW                    |                             |    57 |   912 |     2   (0)| 00:00:01 |
|  14 |     TABLE ACCESS FULL      | SYS_TEMP_0FD9D6628_284C5768 |    57 |  4104 |     2   (0)| 00:00:01 |
|* 15 |    VIEW                    |                             |    57 |   912 |     2   (0)| 00:00:01 |
|  16 |     TABLE ACCESS FULL      | SYS_TEMP_0FD9D6628_284C5768 |    57 |  4104 |     2   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - access("HORIZON_TABLE"."ID"="A_TABLE"."HORIZON_ID" AND 
              "ANCHOR_TABLE"."ID"="A_TABLE"."ANCHOR_ID")
   5 - filter("A_TABLE"."A_VALUE">=499500 AND "A_TABLE"."A_VALUE"<=500500)
  10 - filter("D"."POSITION">= (SELECT "D1"."POSITION"-:B1 FROM  (SELECT + CACHE_TEMP_TABLE 
              ("T1")  "C0" "ID","C1" "DESCR","C2" "A_VALUE","C3" "OFFSET","C4" "ANCHOR","C5" "POSITION" FROM 
              "SYS"."SYS_TEMP_0FD9D6628_284C5768" "T1") "D1" WHERE "D1"."ANCHOR"=1) AND "D"."POSITION"<= 
              (SELECT "D2"."POSITION"+:B2 FROM  (SELECT + CACHE_TEMP_TABLE ("T1")  "C0" "ID","C1" 
              "DESCR","C2" "A_VALUE","C3" "OFFSET","C4" "ANCHOR","C5" "POSITION" FROM 
              "SYS"."SYS_TEMP_0FD9D6628_284C5768" "T1") "D2" WHERE "D2"."ANCHOR"=1))
  13 - filter("D1"."ANCHOR"=1)
  15 - filter("D2"."ANCHOR"=1)

Note
-----
   - dynamic sampling used for this statement (level=4)

*/

-- Explain plan of database view
explain plan for
  select *
    from testdata_vw
   where a_value between 500000 - 500 and 500000 + 500;

select plan_table_output
  from table(dbms_xplan.display('plan_table', null, null));

/*

Note: Size of SYS_TEMP_0FD9D662A_284C5768 ~ 1000000 rows

Plan hash value: 1422141561

-------------------------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name                        | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                             |  2973 |   180K|       | 50324   (1)| 00:14:16 |
|   1 |  VIEW                       | TESTDATA_VW                 |  2973 |   180K|       | 50324   (1)| 00:14:16 |
|   2 |   TEMP TABLE TRANSFORMATION |                             |       |       |       |            |          |
|   3 |    LOAD AS SELECT           | SYS_TEMP_0FD9D662A_284C5768 |       |       |       |            |          |
|   4 |     WINDOW SORT             |                             |  1189K|   136M|   147M| 37032   (1)| 00:10:30 |
|*  5 |      HASH JOIN              |                             |  1189K|   136M|       |  6868   (1)| 00:01:57 |
|   6 |       TABLE ACCESS FULL     | HORIZON_TABLE               |     1 |    26 |       |     3   (0)| 00:00:01 |
|*  7 |       HASH JOIN             |                             |  1189K|   106M|    38M|  6860   (1)| 00:01:57 |
|   8 |        TABLE ACCESS FULL    | ANCHOR_TABLE                |  1189K|    24M|       |   583   (2)| 00:00:10 |
|   9 |        TABLE ACCESS FULL    | A_TABLE                     |  1209K|    83M|       |  1191   (2)| 00:00:21 |
|* 10 |    FILTER                   |                             |       |       |       |            |          |
|* 11 |     VIEW                    |                             |  1189K|    70M|       |  4431   (1)| 00:01:16 |
|  12 |      TABLE ACCESS FULL      | SYS_TEMP_0FD9D662A_284C5768 |  1189K|    81M|       |  4431   (1)| 00:01:16 |
|* 13 |     VIEW                    |                             |  1189K|    18M|       |  4431   (1)| 00:01:16 |
|  14 |      TABLE ACCESS FULL      | SYS_TEMP_0FD9D662A_284C5768 |  1189K|    81M|       |  4431   (1)| 00:01:16 |
|* 15 |     VIEW                    |                             |  1189K|    18M|       |  4431   (1)| 00:01:16 |
|  16 |      TABLE ACCESS FULL      | SYS_TEMP_0FD9D662A_284C5768 |  1189K|    81M|       |  4431   (1)| 00:01:16 |
-------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   5 - access("HORIZON_TABLE"."ID"="A_TABLE"."HORIZON_ID")
   7 - access("ANCHOR_TABLE"."ID"="A_TABLE"."ANCHOR_ID")
  10 - filter("D"."POSITION">= (SELECT "D1"."POSITION"-:B1 FROM  (SELECT + CACHE_TEMP_TABLE ("T1")  
              "C0" "ID","C1" "DESCR","C2" "A_VALUE","C3" "OFFSET","C4" "ANCHOR","C5" "POSITION" FROM 
              "SYS"."SYS_TEMP_0FD9D662A_284C5768" "T1") "D1" WHERE "D1"."ANCHOR"=1) AND "D"."POSITION"<= (SELECT 
              "D2"."POSITION"+:B2 FROM  (SELECT + CACHE_TEMP_TABLE ("T1")  "C0" "ID","C1" "DESCR","C2" 
              "A_VALUE","C3" "OFFSET","C4" "ANCHOR","C5" "POSITION" FROM "SYS"."SYS_TEMP_0FD9D662A_284C5768" "T1") "D2" 
              WHERE "D2"."ANCHOR"=1))
  11 - filter("A_VALUE">=499500 AND "A_VALUE"<=500500)
  13 - filter("D1"."ANCHOR"=1)
  15 - filter("D2"."ANCHOR"=1)

Note
-----
   - dynamic sampling used for this statement (level=4)
*/

sqlfiddle

explain plan of sql http://www.sqlfiddle.com/#!4/6a7022/3

explain plan of view http://www.sqlfiddle.com/#!4/6a7022/2

Upvotes: 3

Views: 1595

Answers (2)

KevinKirkpatrick
KevinKirkpatrick

Reputation: 1456

You need to write a view definition which returns all possible selectable ranges of a_value as two columns, start_a_value and end_a_value, along with all records which fall into each start/end range. In other words, the correct view definition should logically describe a |n^3| result set given n rows in a_table.

Then query that view as:

SELECT * FROM testdata_vw WHERE START_A_VALUE = 4950 AND END_A_VALUE = 5050;

Also, your multiple references to "data" are unnecessary; same logic can be delivered with an additional analytic function.

Final view def:

CREATE OR REPLACE VIEW testdata_vw AS
SELECT  * 
FROM    
    (
    SELECT  T.*, 
            MAX(CASE WHEN ANCHOR=1 THEN POSITION END) 
                OVER (PARTITION BY START_A_VALUE, END_A_VALUE) ANCHOR_POS 
    FROM 
        (
        SELECT  S.A_VALUE                                       START_A_VALUE, 
                E.A_VALUE                                       END_A_VALUE, 
                B.ID                                            ID,
                B.DESCR                                         DESCR,
                HORIZON_TABLE.OFFSET                            OFFSET,
                CASE
                  WHEN ANCHOR_TABLE.A_DATE = TRUNC(SYSDATE) 
                    THEN 1
                    ELSE 0
                  END                                           ANCHOR,
                ROW_NUMBER() 
                    OVER(PARTITION BY S.A_VALUE, E.A_VALUE 
                         ORDER BY B.A_POSITION_FIELD)           POSITION
        FROM 
                A_TABLE S 
        JOIN    A_TABLE E 
                    ON S.A_VALUE<E.A_VALUE 
        JOIN    A_TABLE B 
                    ON B.A_VALUE BETWEEN S.A_VALUE AND E.A_VALUE
        JOIN    ANCHOR_TABLE
                    ON  ANCHOR_TABLE.ID = B.ANCHOR_ID
        JOIN    HORIZON_TABLE
                    ON HORIZON_TABLE.ID = B.HORIZON_ID
        ) T
    ) T
WHERE POSITION BETWEEN ANCHOR_POS - OFFSET AND ANCHOR_POS+OFFSET;

EDIT: SQL Fiddle with expected execution plan

I'm seeing the same (sensible) plan here that I saw in my database; if you're getting something different, please send fiddle link.

  1. Use index lookup to find 1 row in "S" A_TABLE (A_VALUE = 4950)
  2. Use index lookup to find 1 row in "E" A_TABLE (A_VALUE = 5050)
  3. Nested Loop join #1 and #2 (1 x 1 join, still 1 row)
  4. FTS 1 row from HORIZON table
  5. Cartesian join #1 and #2 (1 x 1, okay to use Cartesian).
  6. Use index lookup to find ~100 rows in "B" A_TABLE with values between 4950 and 5050.
  7. Cartesian join #5 and #6 (1 x 102, okay to use Cartesian).
  8. FTS ANCHOR_TABLE with hash join to #7.
  9. Window-sort for analytic functions

Upvotes: 3

Florin Ghita
Florin Ghita

Reputation: 17643

You have a predicate outside the view and you want to be applied in the view.

For this, you can use push_pred hint:

select /*+PUSH_PRED(v)*/
  *
from 
  testdata_vw v
where 
  a_value between 5000 - 50 and 5000 + 50;

SQLFIDDLE

EDIT: Now I've seen that you use the data subquery three times. For the first occurrence it makes sense to push the predicate, but for d1 and d2 it doesn't. It's another query.

What would I do is to use two context variables, set them according my needs and write the query:

SYS_CONTEXT('my_context_name', 'var5000');

create or replace view testdata_vw as
with data as (
  select
  a_table.id,
  a_table.descr,
  horizon_table.offset,
  case
    when anchor_table.a_date = trunc(sysdate) then
    1
    else
    0
  end as anchor,
  row_number() over( 
  order by a_table.a_position_field) as position
  from a_table
  join anchor_table on (anchor_table.id = a_table.anchor_id)
  join horizon_table on (horizon_table.id = a_table.horizon_id)
  where a_table.a_value between SYS_CONTEXT('my_context_name', 'var5000') - SYS_CONTEXT('my_context_name', 'var50') and SYS_CONTEXT('my_context_name', 'var5000') + SYS_CONTEXT('my_context_name', 'var50')
)
select * 
from data d
where d.position between ( 
    select d1.position - d.offset 
    from data d1 
    where d1.anchor = 1) 
  and ( 
    select d2.position + d.offset 
    from data d2 
    where d2.anchor = 1) ;

to use it:

dbms_session.set_context ('my_context_name', 'var5000', 5000);
dbms_session.set_context ('my_context_name', 'var50', 50);

select * from testdata_vw;

UPDATE: Instead of context variables(which can be used across sessions) you can use package variables as you commented.

Upvotes: 1

Related Questions