codemonkey47
codemonkey47

Reputation: 17

SQL JOIN MULTIPLE TABLES: Cleaner/More efficent Code possible?

I have several tables that I have joined in my query : hr.employees,hr.departments, hr.locations, hr.countries: hr.employees:

name salary department_id (points to hr.departments)
... ... ...
... ... ...

hr.departments

department_id location_id (points to hr.locations)
... ...
... ...

hr.locations

location_id country_id (points to hr.countries)
... ...
... ...

hr.countries

country_id country_name
... ...
... ...

For my query I have to display the salary of each employee with a higher or equal salary to that of the average salary of a country

The code looks as follows and gives me the correct result:

SELECT 
e.first_name || ' ' || e.last_name AS name,
TRIM(CAST(con.country_name AS CHAR(25))) AS country,
e.salary AS salary,
(SELECT ROUND(AVG(e1.salary)) FROM hr.employees e1
    JOIN hr.departments dep1 ON e1.department_id = dep1.department_id
    JOIN hr.locations loc1 ON dep1.location_id = loc1.location_id 
    WHERE country_id = loc.country_id
)AS avg_sal_country
FROM hr.employees e
JOIN hr.departments dep ON e.department_id = dep.department_id
JOIN hr.locations loc ON dep.location_id = loc.location_id
JOIN HR.countries con ON loc.country_id = con.country_id
WHERE (salary >= (SELECT AVG(e1.salary) FROM hr.employees e1
    JOIN hr.departments dep1 ON e1.department_id = dep1.department_id
    JOIN hr.locations loc1 ON dep1.location_id = loc1.location_id 
    WHERE country_id = loc.country_id) )
;

According to the grading system my professor uses the cost of my statement is 27 while the cost of the query from the professor is 16, how can I rewrite this to be more efficent?

Edit: My result looks like this

name country salary avg_sal_country
Bob USA 4000 3800
Suzy UK 3000 2000
Tom USA 5000 3800

Upvotes: 0

Views: 58

Answers (2)

d r
d r

Reputation: 7836

Just for comparisson - your query generates explain plan starting like:

Plan hash value: 4172893912
---------------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |                  |     9 |   729 |    25  (16)| 00:00:01 |
|*  1 |  HASH JOIN OUTER                     |                  |     9 |   729 |    25  (16)| 00:00:01 |   

... it is clear that you use two subqueries (in Select And in Where) and they use the same tables as the main query. If we slightly correct the sintax and get ridd just of 1st subquery so it would look like:

-- STEP 1  -- changed sintax and replaced subquery column with AVG() OVER() Analytic Function
Select      e.FIRST_NAME || ' ' || e.LAST_NAME "NAME",
            con.COUNTRY_NAME "COUNTRY",
            e.SALARY "SALARY",
            Ceil(Avg(e.SALARY) OVER(Partition By loc.COUNTRY_ID)) "AVG_SAL_COUNTRY"  -- Analytic function instead of subquery for every row
From        HR.EMPLOYEES e
Inner Join  HR.DEPARTMENTS dep ON e.DEPARTMENT_ID = dep.DEPARTMENT_ID
Inner Join  HR.LOCATIONS loc ON dep.LOCATION_ID = loc.LOCATION_ID
Inner Join  HR.COUNTRIES con ON loc.COUNTRY_ID = con.COUNTRY_ID
Where       (  SALARY >= (  Select      Avg(e1.SALARY)              -- jet another subquery for every row
                            From        HR.EMPLOYEES e1
                            Inner Join  HR.DEPARTMENTS dep1 ON (e1.DEPARTMENT_ID = dep1.DEPARTMENT_ID)
                            Inner Join  HR.LOCATIONS loc1 ON dep1.LOCATION_ID = loc1.LOCATION_ID 
                            Where       loc1.COUNTRY_ID = loc.COUNTRY_ID
                        ) 
            )
Plan hash value: 2085091834
---------------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |                  |     7 |   441 |    17  (18)| 00:00:01 |
|   1 |  WINDOW SORT                         |                  |     7 |   441 |    17  (18)| 00:00:01 |

.. it looks better now - what would the result be like if we dump the other subquery from Where clause and transform it into joined subquery...

--  STEP 2  replaced where condition subquery and used it joined to the tables
Select      e.FIRST_NAME || ' ' || e.LAST_NAME "NAME",
            con.COUNTRY_NAME "COUNTRY",
            e.SALARY "SALARY",
            avgs.AVG_SAL_COUNTRY  -- no need for analytic function any more
From        HR.EMPLOYEES e
Inner Join  HR.DEPARTMENTS dep ON e.DEPARTMENT_ID = dep.DEPARTMENT_ID
Inner Join  HR.LOCATIONS loc ON dep.LOCATION_ID = loc.LOCATION_ID
Inner Join  HR.COUNTRIES con ON loc.COUNTRY_ID = con.COUNTRY_ID
Inner Join  (   Select        loc1.COUNTRY_ID, CEIL(Avg(e1.SALARY)) "AVG_SAL_COUNTRY"               
                From          HR.EMPLOYEES e1 
                Inner Join  HR.DEPARTMENTS dep1 ON (dep1.DEPARTMENT_ID = e1.DEPARTMENT_ID)
                Inner Join  HR.LOCATIONS loc1 ON (loc1.LOCATION_ID = dep1.LOCATION_ID )
                Group By    loc1.COUNTRY_ID
            ) avgs ON(avgs.COUNTRY_ID = loc.COUNTRY_ID)
Where       e.SALARY >= avgs.AVG_SAL_COUNTRY
Plan hash value: 21112100
--------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |                  |     7 |   448 |    16  (13)| 00:00:01 |
|   1 |  NESTED LOOPS                       |                  |     7 |   448 |    16  (13)| 00:00:01 |

... well, a bit better (16) but not much. So, let us adjust the query with analytic function ...

-- STEP 3  -- back to AVG() OVER() Analytic Function
SELECT NAME, COUNTRY, SALARY, AVG_SAL_COUNTRY
FROM  (
        Select      e.FIRST_NAME || ' ' || e.LAST_NAME "NAME",
                    con.COUNTRY_NAME "COUNTRY",
                    e.SALARY "SALARY",
                    Ceil(Avg(e.SALARY) OVER(Partition By loc.COUNTRY_ID)) "AVG_SAL_COUNTRY"  -- Analytic function instead of subquery for every row
        From        HR.EMPLOYEES e
        Inner Join  HR.DEPARTMENTS dep ON e.DEPARTMENT_ID = dep.DEPARTMENT_ID
        Inner Join  HR.LOCATIONS loc ON dep.LOCATION_ID = loc.LOCATION_ID
        Inner Join  HR.COUNTRIES con ON loc.COUNTRY_ID = con.COUNTRY_ID
      )
WHERE SALARY >= AVG_SAL_COUNTRY
Plan hash value: 94384860
------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                  |   106 |  7738 |     8  (13)| 00:00:01 |
|*  1 |  VIEW                       |                  |   106 |  7738 |     8  (13)| 00:00:01 |

And that would be it - looks better (8) than what we have started with (25)

Upvotes: 0

GMB
GMB

Reputation: 222612

I would recommend window functions (aka analytic functions in Oracle), which can efficiently replace the subquery in your code:

SELECT name, country, salary, ROUND(avg_sal_country) AS avg_sal_country
FROM (
    SELECT 
        e.first_name || ' ' || e.last_name AS name,
        TRIM(CAST(con.country_name AS CHAR(25))) AS country,
        e.salary,
        AVG(e.salary) OVER(PARTITION BY loc.country_id) AS avg_sal_country
    FROM hr.employees e
    JOIN hr.departments dep ON e.department_id = dep.department_id
    JOIN hr.locations loc ON dep.location_id = loc.location_id
    JOIN HR.countries con ON loc.country_id = con.country_id
) x
WHERE salary >= avg_sal_country

Upvotes: 2

Related Questions