Validating optimization of an Oracle query

Question

Ok, so I'm working on this (rather old) project at work, which uses loads of queries towards an Oracle database. I recently stumbled upon this gem, which takes about 6-7 hours to run and returns ~1400 rows. The table/view in question contains ~200'000 rows. I thought this felt like it was taking maybe a little longer than seemed reasonable, so I started having a closer look at it. Now I can't, for security/proprietary reasons, share the exact query, but this should show what the query does in more general terms:

SELECT
    some_field,
    some_other_field
FROM (
    SELECT
        *
    FROM
        some_view a
    WHERE
        some_criteria AND
        a.client_no || ':' || a.engagement_no || ':' || a.registered_date = (
            SELECT
                b.client_no || ':' || b.engagement_no || ':' || MAX(b.registered_date)
            FROM
                some_view b
                JOIN some_engagement_view e
                    ON e.client_no = b.client_no AND e.engagement_no = b.engagement_no
                JOIN some_client_view c
                    ON c.client_no = b.client_no
            WHERE
                some_other_criteria AND
                b.client_no = a.client_no AND
                b.engagement_no = a.engagement_no
            GROUP BY
                b.client_no,
                b.engagement_no
        )
);

Basically what it is supposed to do, as far as I've managed to figure out, is to from some_view (which contains evaluations of customers/engagements) fetch the latest evaluation for every unique client/engagement.

The two joins are there to ensure that the client and engagement exists in another system, where they are primarily handled after you have done the evaluation in this system.

Notice how it concatenates two numbers and a date, and then compares that to a sub-query? "Interesting" design-choice. So I thought that if you replace the concatenation with a proper comparison you might get some kind of performance gain at least. Please notice that I primarily develop .NET and for the web, and am far from an expert when it comes to databases, but I rewrote it as follows:

SELECT
    some_field,
    some_other_filed
FROM
    some_view a
WHERE
    some_criteria AND
    (a.client_no, a.engagement_no, a.registered_date) = (
        SELECT
            b.client_no,
            b.engagement_no,
            MAX(b.registered_date)
        FROM
            some_view b
            JOIN some_engagement_view e
                ON e.client_no = b.client_no AND e.engagement_no = b.engagement_no
            JOIN some_client_view c
                ON c.client_no = b.client_no
        WHERE
            some_other_criteria AND
            b.client_no = a.client_no AND
            b.engagement_no = a.engagement_no
        GROUP BY
            b.client_no,
            b.engagement_no
    )
);

Now if I replace the fields in the very first select with a COUNT(1), I get exactly the same number of rows with both queries, so a good start. The new query fetches data just as fast as it counts, < 10 seconds. The old query gets the count in ~20 seconds, and as I mentioned before, the data takes close to 6-7 hours. It is currently running so that I can do some kind of analysis to see if the new query is valid, but I thought that I'd ask here as well to see if there is anything apparently wrong that I have done?

EDIT Also removed the outer-most query, which did not seem to fulfill any kind of purpose, except maybe making the query look cooler.. or something.. I dunno..

Validating optimization of an Oracle query

Answers (1)

Related Questions