Reputation: 643
I made a query which works, but is a little slow in my opinion. When I suppress the output to 10 rows, it takes 13 minutes to perform the query. This is the query, stripped from some stuff:
SELECT
(SELECT ANSWER
FROM (
SELECT to_number(fiit.ANSWER, '999') ANSWER,
foin.CLIENT_ID id,
foin.STARTDATE start_date,
row_number() over(PARTITION BY foin.CLIENT_ID ORDER BY foin.FORM_ID ASC) rnk
FROM forms_filled foin, forms_items_filled fiit, treatment trtm
WHERE foin.FORM_ID = fiit.FORM_ID
AND foin.CLIENT_ID = trtm.CLIENT_ID
AND fiit.FORM_NUMBER = 607
AND fiit.FORM_ITEM_NUMBER = 3779
AND length(fiit.ANSWER) >= 1
AND trtm.TREATMENTCODE = 'K'
AND trtm.ENDDATE BETWEEN TRUNC(to_date('01/01/2014', 'dd/mm/yyyy'), 'DDD') AND TRUNC(to_date('31/12/2014', 'dd/mm/yyyy'), 'DDD')
AND foin.STARTDATE BETWEEN trtm.STARTDATUM AND NVL(trtm.ENDDATE, to_date('01/01/9999', 'dd/mm/yyyy'))
) inn
WHERE rnk = 1
AND inn.id = client.CLIENT_ID
) form1
FROM treatment trtm, CLIENT client
WHERE trtm.TREATMENTCODE = 'K'
AND client.CLIENT_ID = trtm.CLIENT_ID
AND trtm.ENDDATE BETWEEN TRUNC(to_date('01/01/2014', 'dd/mm/yyyy'), 'DDD') AND TRUNC(to_date('31/12/2014', 'dd/mm/yyyy'), 'DDD')
The outer query results in 175 clients who have a particular treatment code and have a treatment enddate in 2014. Now for each of these clients a lot of other data is retrieved (like name, age, treatment hours) which is not relevant and I left out now. Then there are about 30 similar subqueries, which retrieve answers from forms. I used a correlated query, because to retrieve the answers from these forms, the client id must be known. If that was the only thing the subquery needed to find the data, it wouldn't be a problem, but there is an other requirement: the retrieved forms must be filled in within the treatment period, because I couldn't find a way to push this data from the outer query to the sub-sub-query, I query this again in the subsubquery, which is causing the slow speed.
The reason to have a subquery and a subsubquery is because the N-th ranking answer from a form must be found. In a previous version of my code I didn't have the treatmentcode, treatment start and enddate requirements in the where clause of the subsubquery. This caused the subsubquery to come up with e.g. 4 results which were ranked 1,2,3,4 but not necessarily of forms created within the treatment period, which is wrong.
So adding these lines:
AND trtm.TREATMENTCODE = 'K'
AND trtm.ENDDATE BETWEEN TRUNC(to_date('01/01/2014', 'dd/mm/yyyy'), 'DDD')
AND TRUNC(to_date('31/12/2014', 'dd/mm/yyyy'), 'DDD')
AND foin.STARTDATE BETWEEN trtm.STARTDATUM AND NVL(trtm.ENDDATE, to_date('01/01/9999', 'dd/mm/yyyy'))
caused the query to be correct, where it was not completely correct before. They also caused the query to take several hours, instead of ~40 seconds for 175 rows.
My question now is, how can I rewrite this query to make it faster? I use Oracle 11.2.40 in combination with Toad Data Point 3.5, but I can't see the explain plan unfortunately.
Upvotes: 0
Views: 296
Reputation: 5636
You have a lot of superfluous constructs in here, like TRUNC(to_date('31/12/2014', 'dd/mm/yyyy'), 'DDD')
. You are calling trunc
saying "strip off any time component" and then passing it a constructed date without a time component to begin with. Just say date '2014-01-01'
and be done with it.
As for working with date ranges, if you want to select dates that are in, say, 2014, the best way is to compare like this: myDate >= date '2014-01-01' and myDate < date '2015-01-01'
. This way you don't have to worry about myDate
having a time component and what that time value may be. Save between
for data types with discreet values or dates you know are already in the discreet components you want.
None of these suggestion are a solution to your particular problem. But get into the habit of writing "skinny" code in the first place and it will simplify your search for the problem if it ever runs too slowly.
One major recommendation that will probably speed up your query but even if not will significantly simplify it so increases the maintainability, is to remove subqueries from the select list.
In general, for complicated queries, don't try to write the entire thing in one go. Select a table (treatment in your case) and select the data you know you will be needing. Examine the results. Get to know it if you don't already. Make sure it is complete and accurate.
select t.CLIENT_ID, t.TREATMENTCODE, t.ENDDATE
from treatment t
where t.TREATMENTCODE = 'K'
and t.ENDDATE >= date '2014-01-01'
and t.ENDDATE < date '2015-01-01';
Now join the next table to it, adding to the select list the data you want to see from that table, removing the data from the first that you are satisfied with and don't otherwise need.
select t.CLIENT_ID, c.CLIENT_ID
from treatment t
join client c
on c.CLIENT_ID = t.CLIENT_ID
where t.TREATMENTCODE = 'K'
and t.ENDDATE >= date '2014-01-01'
and t.ENDDATE < date '2015-01-01';
Add any fields you need in the select list to verify you are getting exactly the right result (for the conditions you have specified so far). Repeat for each of the other tables, adding them one at a time until you get the final result. This way, if you suddenly start getting wrong results, you'll know which table started the problem.
Your final result set may contain lots of rows you don't need. That's ok as long as it contains all of the rows you do need. Save the filtering for the last as you want to be able to see all the data the query is generating. When you know the data contains everything you need, the final step is to filter out unwanted results until you have only what you want. But being able to see all the data can show you many ways to perform that filtering, ways that may not be obvious if you filter out data early.
I don't have any test data so I can't test my candidate below. However, it should be fairly close unless I've completely missed something (a distinct possibility). If nothing else, maybe it can point you toward a solution.
SELECT c.CLIENT_ID, to_number( fif.ANSWER, '999' ) form1
FROM treatment t
join CLIENT c
on c.CLIENT_ID = t.CLIENT_ID
join forms_filled ff
on ff.CLIENT_ID = c.CLIENT_ID
join forms_items_filled fif
on fif.FORM_ID = ff.FORM_ID
WHERE t.TREATMENTCODE = 'K'
and fif.FORM_NUMBER = 607
AND fif.FORM_ITEM_NUMBER = 3779
AND length( fif.ANSWER ) >= 1
AND t.ENDDATE >= date '2014-01-01'
AND t.ENDDATE < date '2015-01-01'
AND ff.STARTDATE BETWEEN t.STARTDATUM AND NVL(t.ENDDATE, date '9999-12-31');
One other suggestion: when you have a field like end_date
, the first impulse is to use NULL
as an indicator of "there is no end date yet defined." Try setting it to NOT NULL and using a default of the Max Date value, date '9999-12-31'
. This means the same thing and simplifies comparisons by getting rid of the need for nvl
or other ways of handling NULL values.
Edit: Oops. I had moved the windowing function to get it out of the way because I was only after partial results. It got included when I cut/pasted the code.
Oh well, may as well include that in a final answer.
with
Partial( CLIENT_ID, form1, rnk )as(
SELECT c.CLIENT_ID, to_number( fif.ANSWER, '999' ) form1,
row_number() over(PARTITION BY ff.CLIENT_ID ORDER BY ff.FORM_ID ASC) rnk
FROM treatment t
join CLIENT c
on c.CLIENT_ID = t.CLIENT_ID
join forms_filled ff
on ff.CLIENT_ID = c.CLIENT_ID
join forms_items_filled fif
on fif.FORM_ID = ff.FORM_ID
WHERE t.TREATMENTCODE = 'K'
and fif.FORM_NUMBER = 607
AND fif.FORM_ITEM_NUMBER = 3779
AND fif.ANSWER is not null
AND t.ENDDATE >= date '2014-01-01'
AND t.ENDDATE < date '2015-01-01'
AND ff.STARTDATE BETWEEN t.STARTDATUM AND NVL(t.ENDDATE, date '9999-12-31')
)
select CLIENT_ID, form1
from Partial
where rnk = 1;
Assuming this gets you pretty close, if you look at the execution plan between this one and your original, you should see a significant improvement.
And one other change. You are testing a string to make sure it has at least one character. In Oracle that is not necessary as an empty string is treated as a NULL
. Just check for NOT NULL.
Upvotes: 1
Reputation: 1269973
If you use the keep
keyword to get the first value, you can dispense with the nested subqueries. This, in turn, allows you to use a query correlated with the outer query, so you don't have to re-calculate the results for all rows to get the value for a given row.
The query would look like:
SELECT (SELECT max(to_number(fiit.ANSWER, '999')) keep (dense_rank first order by foin.FORM_ID ASC)
FROM forms_filled foin JOIN
forms_items_filled fiit
ON foin.FORM_ID = fiit.FORM_ID
WHERE foin.CLIENT_ID = trtm.CLIENT_ID AND
fiit.FORM_NUMBER = 607
fiit.FORM_ITEM_NUMBER = 3779 AND
length(fiit.ANSWER) >= 1 AND
foin.STARTDATE BETWEEN trtm.STARTDATUM AND NVL(trtm.ENDDATE, DATE '1999-01-01')
)
I also encourage you to use modern explicit join
syntax and the date
keyword for expressing date constants.
Upvotes: 2