Reputation: 21282
Short story to explain how I got into this query performance mess:
I've been handed an old Oracle database to fix and maintain.
The data model was designed very badly, with no foresight.
The database is in terrible shape, and performs extremely slowly.
And of course, this database hasn't been tuned or modified in over 6 years ...
I'm starting with one of the many "snapshot" tables in this database.
(Rather than tracking history properly, the original developer just scheduled snapshots of records to be copied and stored in additional tables all over the place. So you have to query these snapshot tables to get historical analytics.)
This table has ~100 columns, but we don't care about most of them.
However, when I began work on this a couple days ago, this table had ONLY ONE INDEX: a NON-UNIQUE index of the ID
column, with no primary key constraint at all.
This is a snapshot table, meaning it contains historical copies of rows, from different points in time.
For example, row #12345 on 9/20/19, and again the same row #12345 but as it was on 9/30/19
So, alas, the ID
column must allow duplicate values.
So I figured, step one: create a compound primary key constraint on ID
and snapshot_date
together, to create a proper unique identifier.
The analytics query I'm trying to build is a frankenstein of many different pre-existing queries already being used in this application.
It looks like garbage, because that's what I have to work with ...
select efh.snapshot_date,
max(efhp.snapshot_date) as previous_snapshot_date,
substr(efh.edge_vp,1,instr(efh.edge_vp,'@oracle.com')-1) as edge_vp,
substr(efh.edge_rm,1,instr(efh.edge_rm,'@oracle.com')-1) as edge_rm,
sum(case when efh.oppty_status = 'Open' then NVL(efh.ARR_FORECAST, 0) else 0 end) as forecast,
sum(case when efh.oppty_status = 'Open' then NVL(efh.ARR_BEST,0) else 0 end) as best,
sum(case when efh.oppty_status = 'Won' then NVL(efh.ARR,0) else 0 end) as closed,
sum(case when efh.oppty_status = 'Open' then nvl(efh.ARR_PIPELINE,0) else 0 end) as pipeline,
sum(case when efh.oppty_status = 'Open' then NVL(efh.ARR_BEST,0) else 0 end) +
sum(case when efh.oppty_status = 'Open' then nvl(efh.ARR_PIPELINE,0) else 0 end) as pipe_best,
sum(case when efh.oppty_status = 'Won' then efh.ARR else 0 end) +
sum(case when efh.oppty_status = 'Open' then NVL(efh.ARR_FORECAST,0) else 0 end) as closed_forecast
from edge_forecast_hist efh
left join edge_forecast_hist efhp on efhp.edge_vp = efh.edge_vp and efhp.edge_rm = efh.edge_rm and efhp.snapshot_date < efh.snapshot_date
where efh.snapshot_date >= TRUNC(sysdate) - INTERVAL '70' DAY
and efh.edge_asm != 'REDACTED'
and efh.oppty_status in ('Open', 'Won')
group by efh.snapshot_date,
substr(efh.edge_vp,1,instr(efh.edge_vp,'@oracle.com')-1),
substr(efh.edge_rm,1,instr(efh.edge_rm,'@oracle.com')-1)
order by 1, 2, 3, 4
At the beginning, this query took ~5 minutes to execute.
After I created a primary key, as well as an index on every single column being referenced in this query, the execution time dropped to ~4 minutes, which is definitely an improvement, but not as good as I expected.
(This query only returns a couple hundred rows.)
When I tried to explain
this query, I noticed that only a few of the indexes are actually being used.
(See three checkmarks indicating which indexes are being used, in the screenshot above.)
And there is some troubling language in the execution plan,
such as NESTED LOOPS
and TABLE ACCESS
(table scan??)
This is a terrible database, and I'm new to Oracle, and I don't totally understand the nuances of everything in the execution plan.
What appears to be the bottleneck here, and how might I mitigate it?
Some ideas that come to mind:
case
statements per column, in the select list, inside aggregate functions (SUM
)If it would really help, I can do several hours of work, trying to split those case
statements out.
(i.e. computing results in multiple subqueries, with the case
conditions moved into the where
clauses.)
But I also assumed Oracle was smart enough to find the most efficient execution plan, without needing to be spoon-fed.
In other words, I don't want to do all that work, just to end up with the exact same execution plan.
I need to know what the actual problem is, so I can produce a targeted solution that properly addresses the root-cause.
Upvotes: 3
Views: 2157
Reputation: 147146
Just one more version - no window functions (although they are a good idea) but JOIN
ing two derived tables (one with efh.oppty_status = 'Open'
and the other with efh.oppty_status = 'Won'
in order to get rid of all the conditionals in the SUMs. Also taking advantage of the fact that grouping by edge_vp
is the same as grouping by substr(efh.edge_vp,1,instr(efh.edge_vp,'@oracle.com')-1)
since there is nothing in the string after @oracle.com
.
select o.snapshot_date,
greatest(o.previous_snapshot_date, w.previous_snapshot_date) as previous_snapshot_date,
substr(o.edge_vp,1,instr(o.edge_vp,'@oracle.com')-1) as edge_vp,
substr(o.edge_rm,1,instr(o.edge_rm,'@oracle.com')-1) as edge_rm,
o.forecast,
o.best,
w.closed,
o.pipeline,
o.best + o.pipeline AS pipe_best,
w.closed + o.forecast AS closed_forecast
from (select efh.snapshot_date,
max(efhp.snapshot_date) as previous_snapshot_date,
efh.edge_vp,
efh.edge_rm,
sum(NVL(efh.ARR_FORECAST, 0)) as forecast,
sum(NVL(efh.ARR_BEST,0)) as best,
sum(nvl(efh.ARR_PIPELINE,0)) as pipeline
from edge_forecast_hist efh
left join edge_forecast_hist efhp on efhp.edge_vp = efh.edge_vp and efhp.edge_rm = efh.edge_rm and efhp.snapshot_date < efh.snapshot_date
where efh.snapshot_date >= TRUNC(sysdate) - INTERVAL '70' DAY
and efh.edge_asm != 'REDACTED'
and efh.oppty_status = 'Open'
group by efh.snapshot_date,
efh.edge_vp,
efh.edge_rm
) o
join (select efh.snapshot_date,
max(efhp.snapshot_date) as previous_snapshot_date,
efh.edge_vp,
efh.edge_rm,
sum(NVL(efh.ARR,0)) as closed
from edge_forecast_hist efh
left join edge_forecast_hist efhp on efhp.edge_vp = efh.edge_vp and efhp.edge_rm = efh.edge_rm and efhp.snapshot_date < efh.snapshot_date
where efh.snapshot_date >= TRUNC(sysdate) - INTERVAL '70' DAY
and efh.edge_asm != 'REDACTED'
and efh.oppty_status = 'Won'
group by efh.snapshot_date,
efh.edge_vp,
efh.edge_rm
) w ON w.snapshot_date = o.snapshot_date AND w.edge_vp = o.edge_vp AND w.edge_rm = o.edge_rm
order by 1, 2, 3, 4
Upvotes: 1
Reputation: 11255
This has some similarity as @Gordon Linoff but uses the additional analytical function of Rows Unbounded Preceding
to create a cumulative sum which is what your query appears to do:
SELECT snapshot_date, previous_snapshot_date, edge_vp, edge_rm
, forecast, best, closed, pipeline
, best + pipeline AS pipe_best
, closed + forecast AS closed_forecast
FROM
(
SELECT efh.snapshot_date
, LAG(efh.snapshot_date) OVER (PARTITION BY efh.edge_vp, efh.edge_rm ORDER BY snapshot_date) as previous_snapshot_date
, substr(efh.edge_vp,1,instr(efh.edge_vp,'@oracle.com')-1) as edge_vp
, substr(efh.edge_rm,1,instr(efh.edge_rm,'@oracle.com')-1) as edge_rm
, SUM(CASE WHEN efh.oppty_status = 'Open' THEN efh.ARR_FORECAST END) OVER (PARTITION BY efh.edge_vp, efh.edge_rm ORDER BY snapshot_date ROWS UNBOUNDED PRECEDING) as forecast
, SUM(CASE WHEN efh.oppty_status = 'Open' THEN efh.ARR_BEST END) OVER (PARTITION BY efh.edge_vp, efh.edge_rm ORDER BY snapshot_date ROWS UNBOUNDED PRECEDING) as best
, SUM(CASE WHEN efh.oppty_status = 'Won' THEN efh.ARR END) OVER (PARTITION BY efh.edge_vp, efh.edge_rm ORDER BY snapshot_date ROWS UNBOUNDED PRECEDING) as closed
, SUM(CASE WHEN efh.oppty_status = 'Open' THEN efh.ARR_PIPELINE END) OVER (PARTITION BY efh.edge_vp, efh.edge_rm ORDER BY snapshot_date ROWS UNBOUNDED PRECEDING) as pipeline
FROM edge_forecast_hist efh
WHERE efh.snapshot_date >= TRUNC(sysdate) - INTERVAL '70' DAY
AND efh.edge_asm != 'REDACTED'
AND efh.oppty_status in ('Open', 'Won')
)
Upvotes: 2
Reputation: 1269503
I am pretty sure that you can replace this query with one using analytic functions, which would be much faster.
The filtering makes this a bit tricky. In general, that means running the window functions in a subquery and then filtering in an outer query. However, there could be an interplay with the JOIN
conditions.
All that said, the query you want is something like this:
select efh.snapshot_date, efh.previous_snapshot_date,
efh.edge_vp, efh.edge_rm,
efh.forecast,
. . .
from (select eft.*,
lag(efh.snapshot_date) over (partition by efh.edge_vp, efh.edge_rm order by efh.snapshot_date) as previous_snapshot_date,
substr(efh.edge_vp, 1, instr(efh.edge_vp,'@oracle.com')-1) as edge_vp,
substr(efh.edge_rm, 1, instr(efh.edge_rm,'@oracle.com')-1) as edge_rm,
sum(case when efh.oppty_status = 'Open' then efh.ARR_FORECAST else 0 end) over (partition by efh.edge_vp, efh.edge_rm order by efh.snapshot_date) as forecast,
. . .
from edge_forecast_hist efh
) efh
where efh.snapshot_date >= TRUNC(sysdate) - INTERVAL '70' DAY and
efh.edge_asm <> 'REDACTED' and
efh.oppty_status in ('Open', 'Won');
The additional columns should be following the same structure as forecast
.
Upvotes: 3