How can I improve this oracle 11g plan selection without changing query?

I have a data re-population scheme for a set of oracle 11g database tables that entails:

Drop and recreate all tables. Drop is like DROP TABLE TCA_SECURITY_DAY_BAR CASCADE CONSTRAINTS PURGE
Load static data - i.e. lookup tables, then load some data associated with dates

At various times in load process I intersperse gather statistics commands, which I've found greatly sped up various queries of load process itself. However, I have little confidence in my options because they were selected via guesswork.

Targeted table gather invocations. Example:

dbms_stats.gather_table_stats(                            
    ownname => 'TCA',                                     
    tabname => 'TCA_SECURITY_DAY_BAR',               
    estimate_percent => DBMS_STATS.AUTO_SAMPLE_SIZE,      
    block_sample => TRUE,                                 
    method_opt => 'FOR ALL INDEXED COLUMNS SIZE 1',       
    cascade => TRUE                                       
);

Schema invocations: Example:

dbms_stats.gather_schema_stats(
    ownname => 'TCA',
    estimate_percent => NULL,
    block_sample => TRUE,
    method_opt => 'FOR ALL INDEXED COLUMNS SIZE 1',
    cascade => TRUE,
    granularity => 'ALL',
    options => 'GATHER'                                                                                                                     
)

After a reload process I run some reporting and I have class of problem query that is taking 7 seconds when it can/should be sub-second. I already have improvements for the queries, such as using parameters and being more specific in the where clause. With this question I'm not trying to improve the query but rather trying to improve the plan selected by the engine or at least understand its mysterious behavior.

The queries are generated from a list of column expressions (non-grouped, aggregated) where I chunk the list into some number of columns (e.g. 3, 4, 5) to see what visually looks best in the report. For a given set of join clauses the number of columns expressions selected and some other factor (time of day, caching, temperature, who knows) is effecting whether the optimizer is choosing a fast plan or a slow plan.

Here is the sample query with 1 grouped column multiplier and 5 non-grouped aggregated expressions:

SELECT
    TO_.multiplier
    -- multiplier.bracket
    ,   min(TCA.TCA_SECURITY.symbol)
    -- symbol.min
    ,   min(TO_.px_limit)
    -- px_limit.min
    ,   min(TCA.TCA_SECURITY_DAY_BAR.px_open_day)
    -- px_open_day.min
    ,   min(TCA.TCA_SECURITY_DAY_BAR.px_high_day)
    -- px_high_day.min
    ,   min(TCA.TCA_SECURITY_DAY_BAR.px_low_day)
    -- px_low_day.min
FROM TCA_ORDER TO_
    LEFT JOIN TCA_ORDER_SECURITY ON (TO_.ID = TCA_ORDER_SECURITY.ORDER_ID)
    LEFT JOIN TCA.TCA_SECURITY TCA_SECURITY ON (TCA_ORDER_SECURITY.SECURITY_ID = TCA_SECURITY.ID)
    LEFT JOIN TCA.TCA_SECURITY_DAY_BAR ON (
        TCA_SECURITY.ID = TCA_SECURITY_DAY_BAR.SECURITY_ID
        AND TO_.TRADE_DATE = TCA_SECURITY_DAY_BAR.TRADE_DATE
    )

WHERE TRADING_UNIT_ID IN (621)
GROUP BY TO_.multiplier

Observations:

When comparing the queries with different numbers of columns I'm not running statistics in between and I believe they are not being run (since setup is to do it nightly).
Once a pattern of a slow query is in place it is consistent for some time.
Initially, I found that the query with 5 columns was always slow (> 7 seconds)
If I only had 4 it was fast.
The next day 5 was suddenly fast. But selecting 4 columns was slow.
When I copy/pasted the logged slow running queries into SQL Developer they were fast
As of this morning - 3 is fast, 4 is slow and 5 is fast

I know the schema is setup to have statistics run nightly, but I don't really know the exact command used for that. If those nightly statistics gathering calls are not special and I could call them myself on-demand that may help - I'd just need to know how to exactly mimic it.

To try to troubleshoot, in my kotlin/jdbc code I've instrumented with calls to explain last plan run. Here is a diff showing a query with 5 columns that is fast and 4 columns that is slow. It is clear that for some reason it is selecting "NESTED LOOPS OUTER" in the slow case and I don't know why.

So I'm looking for an explanation of what may be going on and suggestions to impact the optimizer's choice without changing the query.

Upvotes: 2

Answers (2)

Jon Heller

Reputation: 36922

If you have an unknown statistics problem, the first step is to go back to the defaults:

dbms_stats.gather_table_stats(
    ownname => 'TCA',
    tabname => 'TCA_SECURITY_DAY_BAR'
);

The default values will likely be both faster and more accurate since 11g introduced a new Number of Distinct Values (NDV) optimization. But changing ESTIMATE_PERCENT or BLOCK_SAMPLE may disable that optimization.

If the statistics are still bad for a few days, look at DBA_TAB_COLUMNS.HISTOGRAM and see if they are changing. The default statistics job only gathers histograms on columns that have been used in a relevant query. If you drop the table, Oracle will forgot about all the previous queries on that table and won't gather the relevant histograms until the queries are re-run. (To get around this problem, instead of messing with METHOD_OPT, you can run some dummy SELECT queries after the table is recreated.)

Edit: How to find changes

You can find if the execution plan changes for a query over time with a query like this:

select begin_interval_time, plan_hash_value, dba_hist_sqlstat.*
from dba_hist_sqlstat
join dba_hist_snapshot
    on dba_hist_sqlstat.snap_id = dba_hist_snapshot.snap_id
where sql_id = '<your SQL_ID>'
order by begin_interval_time desc;

The most common reason for plan changes is statistics changes. You can check for unexpected stats changes in DBA_TABLES.LAST_ANALYZED and DBA_OPTSTAT_OPERATIONS.

Caching is another possible reason for performance changes. Oracle's most important cache stores blocks, 8 KB chunks of data from tables or indexes. It's possible that another query might read from the tables and cache the blocks, which accidentally helps your query run faster.

Adaptive reoptimizations can dynamically change the way plans work, but there weren't as many of those options in 11g. Still, you should look at the "Note" section of the execution plans to see if there is any weird plan management going on. (You've posted a lot of good execution plan details, but the "Note" section is missing from the bottom.)

NESTED LOOPs are best for joining a small percentage of rows, and HASH JOINs are the best choice for querying a large percentage of rows. It's best when we can provide Oracle with good statistics, and let it automatically decide which join types are best. But if you don't want to look into this anymore, it's OK to eventually cheat and use a hint like /*+ USE_HASH(TABLE1 TABLE2) */. We can't always get to the root cause of every performance problem.

Finally, to take a more detailed look into what changed, look at DBA_HIST_ACTIVE_SESS_HISTORY. It contains a sample from every 10 seconds of what query was running which operation, and what event the query was waiting on. It takes some work, but you can use that view to compare almost all the details between multiple runs of queries, and to also check for system activity.

This can be a lot of frustrating work. Oracle gets the plans right 99.999% of the time, and they provide tools to deal with the outliers, but it's not always simple.

Upvotes: 1

djanoti

Reputation: 303

if you are going to fire same queries after your
drop/recreate/populate and your data doesn't change vastly, you can you SQL plan baseline. basically build a good plan and force oracle to use it when the same query is executed. More you can find here : https://docs.oracle.com/database/121/TGSQL/tgsql_spm.htm#TGSQL94621
if option 1) doesn't work for you, you can also enable dynamic sampling. this will basically collect stats at run time for a small subset of rows if stats are not already gathered. : https://oracle-base.com/articles/12c/dynamic-statistics-12cr1

Moreover, for quick gathering stats of tables you can set "estimate_percent" to s small number(5-10%), this will quickly gather stats, but might screw with auto gather stats.

Upvotes: 1

How can I improve this oracle 11g plan selection without changing query?

Answers (2)

Related Questions