Reputation: 3391
I have a data re-population scheme for a set of oracle 11g database tables that entails:
DROP TABLE TCA_SECURITY_DAY_BAR CASCADE CONSTRAINTS PURGE
At various times in load process I intersperse gather statistics commands, which I've found greatly sped up various queries of load process itself. However, I have little confidence in my options because they were selected via guesswork.
dbms_stats.gather_table_stats(
ownname => 'TCA',
tabname => 'TCA_SECURITY_DAY_BAR',
estimate_percent => DBMS_STATS.AUTO_SAMPLE_SIZE,
block_sample => TRUE,
method_opt => 'FOR ALL INDEXED COLUMNS SIZE 1',
cascade => TRUE
);
dbms_stats.gather_schema_stats(
ownname => 'TCA',
estimate_percent => NULL,
block_sample => TRUE,
method_opt => 'FOR ALL INDEXED COLUMNS SIZE 1',
cascade => TRUE,
granularity => 'ALL',
options => 'GATHER'
)
After a reload process I run some reporting and I have class of problem query that is taking 7 seconds when it can/should be sub-second. I already have improvements for the queries, such as using parameters and being more specific in the where clause. With this question I'm not trying to improve the query but rather trying to improve the plan selected by the engine or at least understand its mysterious behavior.
The queries are generated from a list of column expressions (non-grouped, aggregated) where I chunk
the list into some number of columns (e.g. 3, 4, 5) to see what visually looks best in the report. For a given set of join clauses the number of columns expressions selected and some other factor (time of day, caching, temperature, who knows) is effecting whether the optimizer is choosing a fast plan or a slow plan.
Here is the sample query with 1 grouped column multiplier
and 5 non-grouped aggregated expressions:
SELECT
TO_.multiplier
-- multiplier.bracket
, min(TCA.TCA_SECURITY.symbol)
-- symbol.min
, min(TO_.px_limit)
-- px_limit.min
, min(TCA.TCA_SECURITY_DAY_BAR.px_open_day)
-- px_open_day.min
, min(TCA.TCA_SECURITY_DAY_BAR.px_high_day)
-- px_high_day.min
, min(TCA.TCA_SECURITY_DAY_BAR.px_low_day)
-- px_low_day.min
FROM TCA_ORDER TO_
LEFT JOIN TCA_ORDER_SECURITY ON (TO_.ID = TCA_ORDER_SECURITY.ORDER_ID)
LEFT JOIN TCA.TCA_SECURITY TCA_SECURITY ON (TCA_ORDER_SECURITY.SECURITY_ID = TCA_SECURITY.ID)
LEFT JOIN TCA.TCA_SECURITY_DAY_BAR ON (
TCA_SECURITY.ID = TCA_SECURITY_DAY_BAR.SECURITY_ID
AND TO_.TRADE_DATE = TCA_SECURITY_DAY_BAR.TRADE_DATE
)
WHERE TRADING_UNIT_ID IN (621)
GROUP BY TO_.multiplier
Observations:
I know the schema is setup to have statistics run nightly, but I don't really know the exact command used for that. If those nightly statistics gathering calls are not special and I could call them myself on-demand that may help - I'd just need to know how to exactly mimic it.
To try to troubleshoot, in my kotlin/jdbc code I've instrumented with calls to explain last plan run. Here is a diff showing a query with 5 columns that is fast and 4 columns that is slow. It is clear that for some reason it is selecting "NESTED LOOPS OUTER" in the slow case and I don't know why.
So I'm looking for an explanation of what may be going on and suggestions to impact the optimizer's choice without changing the query.
Upvotes: 2
Views: 452
Reputation: 36922
If you have an unknown statistics problem, the first step is to go back to the defaults:
dbms_stats.gather_table_stats(
ownname => 'TCA',
tabname => 'TCA_SECURITY_DAY_BAR'
);
The default values will likely be both faster and more accurate since 11g introduced a new Number of Distinct Values (NDV) optimization. But changing ESTIMATE_PERCENT
or BLOCK_SAMPLE
may disable that optimization.
If the statistics are still bad for a few days, look at DBA_TAB_COLUMNS.HISTOGRAM
and see if they are changing. The default statistics job only gathers histograms on columns that have been used in a relevant query. If you drop the table, Oracle will forgot about all the previous queries on that table and won't gather the relevant histograms until the queries are re-run. (To get around this problem, instead of messing with METHOD_OPT
, you can run some dummy SELECT
queries after the table is recreated.)
Edit: How to find changes
You can find if the execution plan changes for a query over time with a query like this:
select begin_interval_time, plan_hash_value, dba_hist_sqlstat.*
from dba_hist_sqlstat
join dba_hist_snapshot
on dba_hist_sqlstat.snap_id = dba_hist_snapshot.snap_id
where sql_id = '<your SQL_ID>'
order by begin_interval_time desc;
The most common reason for plan changes is statistics changes. You can check for unexpected stats changes in DBA_TABLES.LAST_ANALYZED and DBA_OPTSTAT_OPERATIONS.
Caching is another possible reason for performance changes. Oracle's most important cache stores blocks, 8 KB chunks of data from tables or indexes. It's possible that another query might read from the tables and cache the blocks, which accidentally helps your query run faster.
Adaptive reoptimizations can dynamically change the way plans work, but there weren't as many of those options in 11g. Still, you should look at the "Note" section of the execution plans to see if there is any weird plan management going on. (You've posted a lot of good execution plan details, but the "Note" section is missing from the bottom.)
NESTED LOOPs are best for joining a small percentage of rows, and HASH JOINs are the best choice for querying a large percentage of rows. It's best when we can provide Oracle with good statistics, and let it automatically decide which join types are best. But if you don't want to look into this anymore, it's OK to eventually cheat and use a hint like /*+ USE_HASH(TABLE1 TABLE2) */
. We can't always get to the root cause of every performance problem.
Finally, to take a more detailed look into what changed, look at DBA_HIST_ACTIVE_SESS_HISTORY. It contains a sample from every 10 seconds of what query was running which operation, and what event the query was waiting on. It takes some work, but you can use that view to compare almost all the details between multiple runs of queries, and to also check for system activity.
This can be a lot of frustrating work. Oracle gets the plans right 99.999% of the time, and they provide tools to deal with the outliers, but it's not always simple.
Upvotes: 1
Reputation: 303
Moreover, for quick gathering stats of tables you can set "estimate_percent" to s small number(5-10%), this will quickly gather stats, but might screw with auto gather stats.
Upvotes: 1