Benjamin
Benjamin

Reputation: 643

Optimize slow correlated query in Oracle SQL

I made a query which works, but is a little slow in my opinion. When I suppress the output to 10 rows, it takes 13 minutes to perform the query. This is the query, stripped from some stuff:

SELECT
    (SELECT ANSWER
        FROM (
            SELECT to_number(fiit.ANSWER, '999') ANSWER,
                    foin.CLIENT_ID id,
                    foin.STARTDATE start_date,
                    row_number() over(PARTITION BY foin.CLIENT_ID ORDER BY foin.FORM_ID ASC) rnk
                FROM forms_filled foin, forms_items_filled fiit, treatment trtm
                WHERE foin.FORM_ID = fiit.FORM_ID
                AND foin.CLIENT_ID = trtm.CLIENT_ID
                AND fiit.FORM_NUMBER = 607
                AND fiit.FORM_ITEM_NUMBER = 3779
                AND length(fiit.ANSWER) >= 1
                AND trtm.TREATMENTCODE = 'K'
                AND trtm.ENDDATE BETWEEN TRUNC(to_date('01/01/2014', 'dd/mm/yyyy'), 'DDD') AND TRUNC(to_date('31/12/2014', 'dd/mm/yyyy'), 'DDD')
                AND foin.STARTDATE BETWEEN trtm.STARTDATUM AND NVL(trtm.ENDDATE, to_date('01/01/9999', 'dd/mm/yyyy'))
                ) inn
    WHERE rnk = 1
    AND inn.id = client.CLIENT_ID
    ) form1
FROM treatment trtm, CLIENT client
WHERE trtm.TREATMENTCODE = 'K'
AND client.CLIENT_ID = trtm.CLIENT_ID
AND trtm.ENDDATE BETWEEN TRUNC(to_date('01/01/2014', 'dd/mm/yyyy'), 'DDD') AND TRUNC(to_date('31/12/2014', 'dd/mm/yyyy'), 'DDD')

The outer query results in 175 clients who have a particular treatment code and have a treatment enddate in 2014. Now for each of these clients a lot of other data is retrieved (like name, age, treatment hours) which is not relevant and I left out now. Then there are about 30 similar subqueries, which retrieve answers from forms. I used a correlated query, because to retrieve the answers from these forms, the client id must be known. If that was the only thing the subquery needed to find the data, it wouldn't be a problem, but there is an other requirement: the retrieved forms must be filled in within the treatment period, because I couldn't find a way to push this data from the outer query to the sub-sub-query, I query this again in the subsubquery, which is causing the slow speed.

The reason to have a subquery and a subsubquery is because the N-th ranking answer from a form must be found. In a previous version of my code I didn't have the treatmentcode, treatment start and enddate requirements in the where clause of the subsubquery. This caused the subsubquery to come up with e.g. 4 results which were ranked 1,2,3,4 but not necessarily of forms created within the treatment period, which is wrong.

So adding these lines:

AND trtm.TREATMENTCODE = 'K'
AND trtm.ENDDATE BETWEEN TRUNC(to_date('01/01/2014', 'dd/mm/yyyy'), 'DDD')
AND TRUNC(to_date('31/12/2014', 'dd/mm/yyyy'), 'DDD')
AND foin.STARTDATE BETWEEN trtm.STARTDATUM AND NVL(trtm.ENDDATE, to_date('01/01/9999', 'dd/mm/yyyy'))

caused the query to be correct, where it was not completely correct before. They also caused the query to take several hours, instead of ~40 seconds for 175 rows.

My question now is, how can I rewrite this query to make it faster? I use Oracle 11.2.40 in combination with Toad Data Point 3.5, but I can't see the explain plan unfortunately.

Upvotes: 0

Views: 296

Answers (2)

TommCatt
TommCatt

Reputation: 5636

You have a lot of superfluous constructs in here, like TRUNC(to_date('31/12/2014', 'dd/mm/yyyy'), 'DDD'). You are calling trunc saying "strip off any time component" and then passing it a constructed date without a time component to begin with. Just say date '2014-01-01' and be done with it.

As for working with date ranges, if you want to select dates that are in, say, 2014, the best way is to compare like this: myDate >= date '2014-01-01' and myDate < date '2015-01-01'. This way you don't have to worry about myDate having a time component and what that time value may be. Save between for data types with discreet values or dates you know are already in the discreet components you want.

None of these suggestion are a solution to your particular problem. But get into the habit of writing "skinny" code in the first place and it will simplify your search for the problem if it ever runs too slowly.

One major recommendation that will probably speed up your query but even if not will significantly simplify it so increases the maintainability, is to remove subqueries from the select list.

In general, for complicated queries, don't try to write the entire thing in one go. Select a table (treatment in your case) and select the data you know you will be needing. Examine the results. Get to know it if you don't already. Make sure it is complete and accurate.

select  t.CLIENT_ID, t.TREATMENTCODE, t.ENDDATE
from    treatment t
where   t.TREATMENTCODE = 'K'
    and t.ENDDATE >= date '2014-01-01'
    and t.ENDDATE  < date '2015-01-01';

Now join the next table to it, adding to the select list the data you want to see from that table, removing the data from the first that you are satisfied with and don't otherwise need.

select  t.CLIENT_ID, c.CLIENT_ID
from    treatment t
join    client    c
    on  c.CLIENT_ID = t.CLIENT_ID
where   t.TREATMENTCODE = 'K'
    and t.ENDDATE >= date '2014-01-01'
    and t.ENDDATE  < date '2015-01-01';

Add any fields you need in the select list to verify you are getting exactly the right result (for the conditions you have specified so far). Repeat for each of the other tables, adding them one at a time until you get the final result. This way, if you suddenly start getting wrong results, you'll know which table started the problem.

Your final result set may contain lots of rows you don't need. That's ok as long as it contains all of the rows you do need. Save the filtering for the last as you want to be able to see all the data the query is generating. When you know the data contains everything you need, the final step is to filter out unwanted results until you have only what you want. But being able to see all the data can show you many ways to perform that filtering, ways that may not be obvious if you filter out data early.

I don't have any test data so I can't test my candidate below. However, it should be fairly close unless I've completely missed something (a distinct possibility). If nothing else, maybe it can point you toward a solution.

SELECT  c.CLIENT_ID, to_number( fif.ANSWER, '999' ) form1
FROM    treatment   t
join    CLIENT      c
    on  c.CLIENT_ID     = t.CLIENT_ID
join    forms_filled ff
    on  ff.CLIENT_ID    = c.CLIENT_ID
join    forms_items_filled fif
    on  fif.FORM_ID     = ff.FORM_ID
WHERE   t.TREATMENTCODE = 'K'
    and fif.FORM_NUMBER = 607
    AND fif.FORM_ITEM_NUMBER = 3779
    AND length( fif.ANSWER ) >= 1
    AND t.ENDDATE >= date '2014-01-01'
    AND t.ENDDATE  < date '2015-01-01'
    AND ff.STARTDATE BETWEEN t.STARTDATUM AND NVL(t.ENDDATE, date '9999-12-31');

One other suggestion: when you have a field like end_date, the first impulse is to use NULL as an indicator of "there is no end date yet defined." Try setting it to NOT NULL and using a default of the Max Date value, date '9999-12-31'. This means the same thing and simplifies comparisons by getting rid of the need for nvl or other ways of handling NULL values.

Edit: Oops. I had moved the windowing function to get it out of the way because I was only after partial results. It got included when I cut/pasted the code.

Oh well, may as well include that in a final answer.

with
Partial( CLIENT_ID, form1, rnk )as(
    SELECT  c.CLIENT_ID, to_number( fif.ANSWER, '999' ) form1,
            row_number() over(PARTITION BY ff.CLIENT_ID ORDER BY ff.FORM_ID ASC) rnk
    FROM    treatment   t
    join    CLIENT      c
        on  c.CLIENT_ID     = t.CLIENT_ID
    join    forms_filled ff
        on  ff.CLIENT_ID    = c.CLIENT_ID
    join    forms_items_filled fif
        on  fif.FORM_ID     = ff.FORM_ID
    WHERE   t.TREATMENTCODE = 'K'
        and fif.FORM_NUMBER = 607
        AND fif.FORM_ITEM_NUMBER = 3779
        AND fif.ANSWER is not null
        AND t.ENDDATE >= date '2014-01-01'
        AND t.ENDDATE  < date '2015-01-01'
        AND ff.STARTDATE BETWEEN t.STARTDATUM AND NVL(t.ENDDATE, date '9999-12-31')
)
select  CLIENT_ID, form1
from    Partial
where   rnk = 1;

Assuming this gets you pretty close, if you look at the execution plan between this one and your original, you should see a significant improvement.

And one other change. You are testing a string to make sure it has at least one character. In Oracle that is not necessary as an empty string is treated as a NULL. Just check for NOT NULL.

Upvotes: 1

Gordon Linoff
Gordon Linoff

Reputation: 1269973

If you use the keep keyword to get the first value, you can dispense with the nested subqueries. This, in turn, allows you to use a query correlated with the outer query, so you don't have to re-calculate the results for all rows to get the value for a given row.

The query would look like:

SELECT (SELECT max(to_number(fiit.ANSWER, '999')) keep (dense_rank first order by foin.FORM_ID ASC)
        FROM forms_filled foin JOIN
             forms_items_filled fiit
             ON foin.FORM_ID = fiit.FORM_ID
        WHERE foin.CLIENT_ID = trtm.CLIENT_ID AND
              fiit.FORM_NUMBER = 607
              fiit.FORM_ITEM_NUMBER = 3779 AND
              length(fiit.ANSWER) >= 1 AND
              foin.STARTDATE BETWEEN trtm.STARTDATUM AND NVL(trtm.ENDDATE, DATE '1999-01-01')
       )

I also encourage you to use modern explicit join syntax and the date keyword for expressing date constants.

Upvotes: 2

Related Questions