Giffyguy
Giffyguy

Reputation: 21282

Why is this Oracle query so slow?

Short story to explain how I got into this query performance mess:
I've been handed an old Oracle database to fix and maintain.

The data model was designed very badly, with no foresight.
The database is in terrible shape, and performs extremely slowly.
And of course, this database hasn't been tuned or modified in over 6 years ...

I'm starting with one of the many "snapshot" tables in this database.
(Rather than tracking history properly, the original developer just scheduled snapshots of records to be copied and stored in additional tables all over the place. So you have to query these snapshot tables to get historical analytics.)

This table has ~100 columns, but we don't care about most of them.
However, when I began work on this a couple days ago, this table had ONLY ONE INDEX: a NON-UNIQUE index of the ID column, with no primary key constraint at all.

This is a snapshot table, meaning it contains historical copies of rows, from different points in time.
For example, row #12345 on 9/20/19, and again the same row #12345 but as it was on 9/30/19
So, alas, the ID column must allow duplicate values.

So I figured, step one: create a compound primary key constraint on ID and snapshot_date together, to create a proper unique identifier.

The analytics query I'm trying to build is a frankenstein of many different pre-existing queries already being used in this application.
It looks like garbage, because that's what I have to work with ...

select efh.snapshot_date,
       max(efhp.snapshot_date) as previous_snapshot_date,
       substr(efh.edge_vp,1,instr(efh.edge_vp,'@oracle.com')-1) as edge_vp,
       substr(efh.edge_rm,1,instr(efh.edge_rm,'@oracle.com')-1) as edge_rm,
       sum(case when efh.oppty_status = 'Open' then NVL(efh.ARR_FORECAST, 0) else 0 end) as forecast,
       sum(case when efh.oppty_status = 'Open' then NVL(efh.ARR_BEST,0) else 0 end) as best,
       sum(case when efh.oppty_status = 'Won' then NVL(efh.ARR,0) else 0 end) as closed,
       sum(case when efh.oppty_status = 'Open' then nvl(efh.ARR_PIPELINE,0) else 0 end) as pipeline,
       sum(case when efh.oppty_status = 'Open' then NVL(efh.ARR_BEST,0) else 0 end) +
       sum(case when efh.oppty_status = 'Open' then nvl(efh.ARR_PIPELINE,0) else 0 end) as pipe_best,
       sum(case when efh.oppty_status = 'Won' then efh.ARR else 0 end) +
       sum(case when efh.oppty_status = 'Open' then NVL(efh.ARR_FORECAST,0) else 0 end) as closed_forecast

  from edge_forecast_hist efh
  left join edge_forecast_hist efhp on efhp.edge_vp = efh.edge_vp and efhp.edge_rm = efh.edge_rm and efhp.snapshot_date < efh.snapshot_date
 where efh.snapshot_date >= TRUNC(sysdate) - INTERVAL '70' DAY
   and efh.edge_asm != 'REDACTED'
   and efh.oppty_status in ('Open', 'Won')

group by efh.snapshot_date,
         substr(efh.edge_vp,1,instr(efh.edge_vp,'@oracle.com')-1),
         substr(efh.edge_rm,1,instr(efh.edge_rm,'@oracle.com')-1)
order by 1, 2, 3, 4

At the beginning, this query took ~5 minutes to execute.

Click on image to enlarge
enter image description here

After I created a primary key, as well as an index on every single column being referenced in this query, the execution time dropped to ~4 minutes, which is definitely an improvement, but not as good as I expected.
(This query only returns a couple hundred rows.)

When I tried to explain this query, I noticed that only a few of the indexes are actually being used.
(See three checkmarks indicating which indexes are being used, in the screenshot above.)

And there is some troubling language in the execution plan,
such as NESTED LOOPS and TABLE ACCESS (table scan??)

Click on image to enlarge
enter image description here

This is a terrible database, and I'm new to Oracle, and I don't totally understand the nuances of everything in the execution plan.
What appears to be the bottleneck here, and how might I mitigate it?

Some ideas that come to mind:

If it would really help, I can do several hours of work, trying to split those case statements out.
(i.e. computing results in multiple subqueries, with the case conditions moved into the where clauses.)
But I also assumed Oracle was smart enough to find the most efficient execution plan, without needing to be spoon-fed.
In other words, I don't want to do all that work, just to end up with the exact same execution plan.
I need to know what the actual problem is, so I can produce a targeted solution that properly addresses the root-cause.

Upvotes: 3

Views: 2157

Answers (3)

Nick
Nick

Reputation: 147146

Just one more version - no window functions (although they are a good idea) but JOINing two derived tables (one with efh.oppty_status = 'Open' and the other with efh.oppty_status = 'Won' in order to get rid of all the conditionals in the SUMs. Also taking advantage of the fact that grouping by edge_vp is the same as grouping by substr(efh.edge_vp,1,instr(efh.edge_vp,'@oracle.com')-1) since there is nothing in the string after @oracle.com.

select o.snapshot_date, 
       greatest(o.previous_snapshot_date, w.previous_snapshot_date) as previous_snapshot_date,
       substr(o.edge_vp,1,instr(o.edge_vp,'@oracle.com')-1) as edge_vp,
       substr(o.edge_rm,1,instr(o.edge_rm,'@oracle.com')-1) as edge_rm,
       o.forecast,
       o.best,
       w.closed,
       o.pipeline,
       o.best + o.pipeline AS pipe_best,
       w.closed + o.forecast AS closed_forecast
from (select efh.snapshot_date,
             max(efhp.snapshot_date) as previous_snapshot_date,
             efh.edge_vp,
             efh.edge_rm,
             sum(NVL(efh.ARR_FORECAST, 0)) as forecast,
             sum(NVL(efh.ARR_BEST,0)) as best,
             sum(nvl(efh.ARR_PIPELINE,0)) as pipeline
      from edge_forecast_hist efh
      left join edge_forecast_hist efhp on efhp.edge_vp = efh.edge_vp and efhp.edge_rm = efh.edge_rm and efhp.snapshot_date < efh.snapshot_date
      where efh.snapshot_date >= TRUNC(sysdate) - INTERVAL '70' DAY
        and efh.edge_asm != 'REDACTED'
        and efh.oppty_status = 'Open'
      group by efh.snapshot_date,
               efh.edge_vp,
               efh.edge_rm
      ) o
join (select efh.snapshot_date,
             max(efhp.snapshot_date) as previous_snapshot_date,
             efh.edge_vp,
             efh.edge_rm,
             sum(NVL(efh.ARR,0)) as closed
      from edge_forecast_hist efh
      left join edge_forecast_hist efhp on efhp.edge_vp = efh.edge_vp and efhp.edge_rm = efh.edge_rm and efhp.snapshot_date < efh.snapshot_date
      where efh.snapshot_date >= TRUNC(sysdate) - INTERVAL '70' DAY
        and efh.edge_asm != 'REDACTED'
        and efh.oppty_status = 'Won'
      group by efh.snapshot_date,
               efh.edge_vp,
               efh.edge_rm
      ) w ON w.snapshot_date = o.snapshot_date AND w.edge_vp = o.edge_vp AND w.edge_rm = o.edge_rm
order by 1, 2, 3, 4

Upvotes: 1

Cole
Cole

Reputation: 11255

This has some similarity as @Gordon Linoff but uses the additional analytical function of Rows Unbounded Preceding to create a cumulative sum which is what your query appears to do:

SELECT snapshot_date, previous_snapshot_date, edge_vp, edge_rm
, forecast, best, closed, pipeline
, best + pipeline AS pipe_best
, closed + forecast AS closed_forecast

FROM
(
  SELECT efh.snapshot_date
  , LAG(efh.snapshot_date) OVER (PARTITION BY efh.edge_vp, efh.edge_rm ORDER BY snapshot_date) as previous_snapshot_date
  , substr(efh.edge_vp,1,instr(efh.edge_vp,'@oracle.com')-1) as edge_vp
  , substr(efh.edge_rm,1,instr(efh.edge_rm,'@oracle.com')-1) as edge_rm
  , SUM(CASE WHEN efh.oppty_status = 'Open' THEN efh.ARR_FORECAST END) OVER (PARTITION BY efh.edge_vp, efh.edge_rm ORDER BY snapshot_date ROWS UNBOUNDED PRECEDING) as forecast
  , SUM(CASE WHEN efh.oppty_status = 'Open' THEN efh.ARR_BEST END) OVER (PARTITION BY efh.edge_vp, efh.edge_rm ORDER BY snapshot_date ROWS UNBOUNDED PRECEDING) as best
  , SUM(CASE WHEN efh.oppty_status = 'Won' THEN efh.ARR END) OVER (PARTITION BY efh.edge_vp, efh.edge_rm ORDER BY snapshot_date ROWS UNBOUNDED PRECEDING) as closed
  , SUM(CASE WHEN efh.oppty_status = 'Open' THEN efh.ARR_PIPELINE END) OVER (PARTITION BY efh.edge_vp, efh.edge_rm ORDER BY snapshot_date ROWS UNBOUNDED PRECEDING) as pipeline

  FROM edge_forecast_hist efh

  WHERE efh.snapshot_date >= TRUNC(sysdate) - INTERVAL '70' DAY
   AND efh.edge_asm != 'REDACTED'
   AND efh.oppty_status in ('Open', 'Won')
)

Upvotes: 2

Gordon Linoff
Gordon Linoff

Reputation: 1269503

I am pretty sure that you can replace this query with one using analytic functions, which would be much faster.

The filtering makes this a bit tricky. In general, that means running the window functions in a subquery and then filtering in an outer query. However, there could be an interplay with the JOIN conditions.

All that said, the query you want is something like this:

select efh.snapshot_date, efh.previous_snapshot_date,
       efh.edge_vp, efh.edge_rm,
       efh.forecast,
       . . .
from (select eft.*,
             lag(efh.snapshot_date) over (partition by efh.edge_vp, efh.edge_rm order by efh.snapshot_date) as previous_snapshot_date,
             substr(efh.edge_vp, 1, instr(efh.edge_vp,'@oracle.com')-1) as edge_vp,
             substr(efh.edge_rm, 1, instr(efh.edge_rm,'@oracle.com')-1) as edge_rm,
             sum(case when efh.oppty_status = 'Open' then efh.ARR_FORECAST else 0 end) over (partition by efh.edge_vp, efh.edge_rm order by efh.snapshot_date) as forecast,
             . . . 
      from edge_forecast_hist efh
     ) efh
where efh.snapshot_date >= TRUNC(sysdate) - INTERVAL '70' DAY and
      efh.edge_asm <> 'REDACTED' and
      efh.oppty_status in ('Open', 'Won');

The additional columns should be following the same structure as forecast.

Upvotes: 3

Related Questions