Reimius
Reimius

Reputation: 5712

Inner Join versus Exists() while avoiding duplicate rows

This is a complicated question so bear with me as I set up the scenario:

Say we have a simplified table setup like so:

table 1(employee): {
   employee_id, -primary key
   first_name,
   last_name,
   days_of_employment
}
with data:
employee_id  first_name last_name days_of_employment
111          Jack       Stevens   543
222          Clarice    Bobber    323
333          Roy        Cook      736
444          Fred       Roberts   1000
...

table 2(teams): {
   team_code, --primary key
   description
}
with data:
team_code description
ERA       Enrollment Records Assoc.
RR        Rolling Runners
FR        French Revolution
...

table 3(employees_teams):{
   employee_id, --primary key
   team_code --primary key
}
with data:
employee_id team_code
111         RR
111         FR
222         FR
222         ERA
333         FR
...

I'm hoping these tables should be clear as to what they are and their purpose. Here is my scenario from requirements: "I want the average days of employment of employees on the Rolling Runners and Enrollment Records Assoc. team." There are two ways I know how to write this query and they both seem to work well enough, but what I really want to know is which one is faster for the oracle database to process. Keep in mind that these queries are written the way they are to keep from producing duplicate rows which would screw up the average calculation:

Query 1:

SELECT AVG(e.days_of_employment) avg_days_of_employment
FROM   employee e,
       (
        SELECT DISTINCT employee_id
        FROM   employees_teams
        WHERE  team_code IN ('ERA','RR')) available_employees
WHERE  e.employee_id = available_employees.employee_id

Query 2:

SELECT AVG(e.days_of_employment) avg_days_of_employment
FROM   employee e
WHERE  EXISTS(
    SELECT 1
    FROM   employees_teams et
    WHERE  et.team_code IN ('ERA','RR')
      AND  et.employee_id = e.employee_id)

It is possible that with this sample data I provided that this situation may not make sense to begin with, but I still would like to know which query is 'better' to use.

Upvotes: 3

Views: 1810

Answers (1)

Rohit
Rohit

Reputation: 583

I would say go with the EXISTS approach since you are not really needing anything from the available_employees other than checking for the existence.
Having said that it depends on your data as well and how your database query optimizer optmizes it. I would suggest you to see the query plan for each approach and see which one is less expensive.

Check these links as well http://dotnetvj.blogspot.com/2009/07/why-we-should-use-exists-instead-of.html Can an INNER JOIN offer better performance than EXISTS

Upvotes: 5

Related Questions