Clay Banks
Clay Banks

Reputation: 4581

More Efficient Query than the EXISTS Condition

I was reading up on the SQL EXISTS Condition and found this snippet from Techonthenet.com

Note: SQL Statements that use the SQL EXISTS Condition are very inefficient since the sub-query is RE-RUN for EVERY row in the outer query's table. There are more efficient ways to write most queries, that do not use the SQL EXISTS Condition

Unless I skipped over it, the article does not explain a more efficient query that doesn't need this condition. Anyone have an idea of what they could be referring to?

Upvotes: 1

Views: 8762

Answers (4)

emre ozcan
emre ozcan

Reputation: 141

This question is answered at Oracle Document 11.5.3.4 Use of EXISTS versus IN for Subqueries.

11.5.3.4 Use of EXISTS versus IN for Subqueries

In certain circumstances, it is better to use IN rather than EXISTS. In general, if the selective predicate is in the subquery, then use IN. If the selective predicate is in the parent query, then use EXISTS.

Sometimes, Oracle can rewrite a subquery when used with an IN clause to take advantage of selectivity specified in the subquery. This is most beneficial when the most selective filter appears in the subquery and there are indexes on the join columns. Conversely, using EXISTS is beneficial when the most selective filter is in the parent query. This allows the selective predicates in the parent query to be applied before filtering the rows against the EXISTS criteria.

"Example 1: Using IN - Selective Filters in the Subquery" and "Example 2: Using EXISTS - Selective Predicate in the Parent" are two examples that demonstrate the benefits of IN and EXISTS. Both examples use the same schema with the following characteristics:

  • There is a unique index on the employees.employee_id field.

  • There is an index on the orders.customer_id field.

  • There is an index on the employees.department_id field.

  • The employees table has 27,000 rows.

  • The orders table has 10,000 rows.

  • The OE and HR schemas, which own these segments, were both analyzed with COMPUTE.

    11.5.3.4.1 Example 1: Using IN - Selective Filters in the Subquery

This example demonstrates how rewriting a query to use IN can improve performance. This query identifies all employees who have placed orders on behalf of customer 144.

The following SQL statement uses EXISTS:

SELECT /* EXISTS example */
         e.employee_id, e.first_name, e.last_name, e.salary
  FROM employees e
 WHERE EXISTS (SELECT 1 FROM orders o                  /* Note 1 */
                  WHERE e.employee_id = o.sales_rep_id   /* Note 2 */
                    AND o.customer_id = 144);            /* Note 3 */

The following plan output is the execution plan (from V$SQL_PLAN) for the preceding statement. The plan requires a full table scan of the employees table, returning many rows. Each of these rows is then filtered against the orders table (through an index).

 ID OPERATION            OPTIONS         OBJECT_NAME            OPT       COST
---- -------------------- --------------- ---------------------- --- ----------
   0 SELECT STATEMENT                                            CHO
   1  FILTER
   2   TABLE ACCESS       FULL            EMPLOYEES              ANA        155
   3   TABLE ACCESS       BY INDEX ROWID  ORDERS                 ANA          3
   4    INDEX             RANGE SCAN      ORD_CUSTOMER_IX        ANA          1

Rewriting the statement using IN results in significantly fewer resources used.

The SQL statement using IN:

SELECT /* IN example */
         e.employee_id, e.first_name, e.last_name, e.salary
    FROM employees e
   WHERE e.employee_id IN (SELECT o.sales_rep_id         /* Note 4 */
                             FROM orders o
                            WHERE o.customer_id = 144);  /* Note 3 */

The following plan output is the execution plan (from V$SQL_PLAN) for the preceding statement. The optimizer rewrites the subquery into a view, which is then joined through a unique index to the employees table. This results in a significantly better plan, because the view (that is, subquery) has a selective predicate, thus returning only a few employee_ids. These employee_ids are then used to access the employees table through the unique index.

  ID OPERATION            OPTIONS         OBJECT_NAME            OPT       COST
---- -------------------- --------------- ---------------------- --- ----------
   0 SELECT STATEMENT                                            CHO
   1  NESTED LOOPS                                                            5
   2   VIEW                                                                   3
   3    SORT              UNIQUE                                              3
   4     TABLE ACCESS     FULL            ORDERS                 ANA          1
   5   TABLE ACCESS       BY INDEX ROWID  EMPLOYEES              ANA          1
   6    INDEX             UNIQUE SCAN     EMP_EMP_ID_PK          ANA

11.5.3.4.2 Example 2: Using EXISTS - Selective Predicate in the Parent

This example demonstrates how rewriting a query to use EXISTS can improve performance. This query identifies all employees from department 80 who are sales reps who have placed orders.

The following SQL statement uses IN:

SELECT /* IN example */
         e.employee_id, e.first_name, e.last_name, e.department_id, e.salary
    FROM employees e
   WHERE e.department_id = 80                                    /* Note 5 */
     AND e.job_id        = 'SA_REP'                              /* Note 6 */
     AND e.employee_id IN (SELECT o.sales_rep_id FROM orders o); /* Note 4 */

The following plan output is the execution plan (from V$SQL_PLAN) for the preceding statement. The SQL statement was rewritten by the optimizer to use a view on the orders table, which requires sorting the data to return all unique employee_ids existing in the orders table. Because there is no predicate, many employee_ids are returned. The large list of resulting employee_ids are then used to access the employees table through the unique index.

  ID OPERATION            OPTIONS         OBJECT_NAME            OPT       COST
---- -------------------- --------------- ---------------------- --- ----------
   0 SELECT STATEMENT                                            CHO
   1  NESTED LOOPS                                                          125
   2   VIEW                                                                 116
   3    SORT              UNIQUE                                            116
   4     TABLE ACCESS     FULL            ORDERS                 ANA         40
   5   TABLE ACCESS       BY INDEX ROWID  EMPLOYEES              ANA          1
   6    INDEX             UNIQUE SCAN     EMP_EMP_ID_PK          ANA

The following SQL statement uses EXISTS:

SELECT /* EXISTS example */
         e.employee_id, e.first_name, e.last_name, e.salary
    FROM employees e
   WHERE e.department_id = 80                           /* Note 5 */
     AND e.job_id        = 'SA_REP'                     /* Note 6 */
     AND EXISTS (SELECT 1                               /* Note 1 */
                   FROM orders o
                  WHERE e.employee_id = o.sales_rep_id);  /* Note 2 */

The following plan output is the execution plan (from V$SQL_PLAN) for the preceding statement. The cost of the plan is reduced by rewriting the SQL statement to use an EXISTS. This plan is more effective, because two indexes are used to satisfy the predicates in the parent query, thus returning only a few employee_ids. The employee_ids are then used to access the orders table through an index.

  ID OPERATION            OPTIONS         OBJECT_NAME            OPT       COST
---- -------------------- --------------- ---------------------- --- ----------
   0 SELECT STATEMENT                                            CHO
   1  FILTER
   2   TABLE ACCESS       BY INDEX ROWID  EMPLOYEES              ANA         98
   3    AND-EQUAL
   4     INDEX            RANGE SCAN      EMP_JOB_IX             ANA
   5     INDEX            RANGE SCAN      EMP_DEPARTMENT_IX      ANA
   6   INDEX              RANGE SCAN      ORD_SALES_REP_IX       ANA          8

Upvotes: 0

Venkatesh R
Venkatesh R

Reputation: 514

When you have main query result set small and result set of sub-query is Large and sub-query uses appropriate indexes - EXISTS / NOT EXISTS is better option in place of IN / NOT IN Clause.

When you have index on larger result set of main query and smaller result set in the sub-query - IN / NOT IN is better option in place of EXISTS / NOT EXISTS Clause.

Upvotes: 0

Luaan
Luaan

Reputation: 63722

You can usually use some "clever" inner join or something like that.

However, all in all, the advice is severely outdated. Yes, there used to be a time when subqueries had a huge cost, but that isn't necessarily the case anymore - as always, profile. And examine execution plans. It's very much possible your DB engine can handle subqueries just fine - in fact, it can be much faster than the hacky inner join (and similar solutions) :)

Always make sure you understand the rationale behind the advice, and to what it actually applies. A simple example on MS SQL:

select * from order
where exists (select * from user where order.CreatedBy = user.Id)

What a horrible sub-query, right? Totally going to run the subquery for every row of the order table, right? Well, the execution planner is smart enough to translate this into a simple left join - involving just two table scans (or, if applicable, index seeks). In other cases, the engine might decide to build hash sets, or temporary tables, or do any other smart thing to make sure the query is fast (within the other trade-offs, like memory usage). Nowadays, you will rarely find that your query tweaks are smarter than what the execution planner does - if your DB engine is up to the task. In fact, this is the whole reason we use SQL - a declarative language - in the first place! Instead of saying how the results should be obtained, you say what relationships lead to the result set you want, giving the DB engine a massive freedom in how to actually get the data - whether it means going through every single row in a table one by one, or seeking through an index.

The default should always be to write the query in a way that makes the most sense. Once you've got a nice, clean and easy to understand query, think about any performance implications, and profile the results (using realistic test data). Look at the execution plan of the query - if you care about SQL performance, you really need to understand execution plans anyway; they tell you all there is to know about the way the query is actually executed, and how to improve various parts of the query (or, more often, the indices and statistics involved).

Upvotes: 5

Erich Kitzmueller
Erich Kitzmueller

Reputation: 36977

First of all, don't trust general statements like

Note: SQL Statements that use the SQL EXISTS Condition are very inefficient since the sub-query is RE-RUN for EVERY row in the outer query's table.

This can be true for some database systems, but other database systems might be able to find a more efficient execution plan for such statements. For example, I tried such a statement on my Oracle database and it uses a hash join to execute the statement efficiently.

Now for the alternatives:

In many cases, you can use an IN subquery. This might work out well even on database systems that would execute EXISTS inefficiently.

So, instead of

select * from foo where exists (select 1 from bar where foo.x = bar.y)

write

select * from foo where x in (select y from bar)

The same can be written with ANY

select * from foo where x = any (select y from bar)

In many cases, it's most desirable to use a join, e.g.

select foo.* from foo inner join bar on foo.x = bar.y

You might have to use DISTINCT to make sure you don't get duplicate results when a row in foo matches more than one row in bar, though.

select distinct foo.* from foo inner join bar on foo.x = bar.y

Upvotes: 3

Related Questions