Roger Feddering
Roger Feddering

Reputation: 67

SQL Oracle query to compare male vs. female salaries for each job at a company

return one row for each Job_ID with the following columns:

• Job ID

Average female service length in that job in years rounded to the tenth of a year

Average Female Salary for that job

Average male service length in that job in years rounded to the tenth of a year

Average Male Salary for that job

The difference in average male and female salaries for that job such that a positive number means the average female salary is higher and negative number means the average male salary is higher.

TABLE

HR.EMPLOYEES

Name           Null     Type         
-------------- -------- ------------ 
EMPLOYEE_ID    NOT NULL NUMBER(6)    
FIRST_NAME              VARCHAR2(20) 
LAST_NAME      NOT NULL VARCHAR2(25) 
EMAIL          NOT NULL VARCHAR2(25) 
PHONE_NUMBER            VARCHAR2(20) 
HIRE_DATE      NOT NULL DATE         
JOB_ID         NOT NULL VARCHAR2(10) 
SALARY                  NUMBER(8,2)  
COMMISSION_PCT          NUMBER(2,2)  
MANAGER_ID              NUMBER(6)    
DEPARTMENT_ID           NUMBER(4)    
GENDER                  CHAR(1) 

What I Have So Far but this is returning an M & F row for each job_id i need M & F columns

SELECT gender, job_id, ROUND(AVG(salary),0) as avg_job_salary,
(SELECT ROUND(AVG(salary),0)
FROM hr.employees 
WHERE gender = 'M') AS avg_m_salary, (SELECT ROUND(AVG(salary),0) 
FROM hr.employees 
WHERE gender = 'F') AS avg_f_salary,
ROUND(AVG(days_of_svc/365),1) AS avg_years_svc
FROM (SELECT  job_id, salary, gender, (SYSDATE-hire_date) AS  days_of_svc 
FROM hr.employees)
GROUP BY job_id,gender
ORDER BY job_id, gender;

or version 2

SELECT gender, job_id, ROUND(AVG(salary),0) as avg_job_salary, 
((SELECT ROUND(AVG(salary),0)
FROM hr.employees 
WHERE gender = 'F') - (SELECT ROUND(AVG(salary),0) 
FROM hr.employees 
WHERE gender = 'M')) as diff,
ROUND(AVG(days_of_svc/365),1) AS avg_years_svc
FROM (SELECT  job_id, salary, gender, (SYSDATE-hire_date) AS  days_of_svc 
FROM hr.employees)
GROUP BY job_id,gender
ORDER BY job_id, gender;

sample expected results rows

JOB_ID    F_AVG_LENGTH F_AVG_SAL M_AVG_LENGTH M_AVG_SAL DIFFERENCE
------    ------------ --------- ------------ --------- ----------
MAILCLERK         24.1     48000         23.4     47000       1000
CASHIER            4.6     12000          4.4     13500      -1500

Upvotes: 5

Views: 2527

Answers (3)

user5683823
user5683823

Reputation:

On my machine I made a copy of HR.EMPLOYEES in my schema, I named the clone HR_EMPLOYEES. Then I added a column for GENDER since on my copy of Oracle, the HR.EMPLOYEES table does not have a GENDER column. I populated the column with my best guesses, just for testing.

In Oracle 11 you can use the PIVOT operation, which makes the job easier. I divided average tenure by 365.25 so it is expressed in years rather than days. Note that there are many jobs that have either no males or no females working in them, so there are many NULL results. I assumed you want them shown too - otherwise they can be left out.

select job_id, round(F_AVG_TENURE_D/365.25, 1)    as f_avg_length, 
               round(F_AVG_SALARY)                as f_avg_salary,
               round(M_AVG_TENURE_D/365.25, 1)    as m_avg_length, 
               round(M_AVG_SALARY)                as m_avg_salary,       
               round(F_AVG_SALARY - M_AVG_SALARY) as avg_sal_diff
from  ( 
        select job_id, gender, sysdate - hire_date as tenure, salary
        from   hr_employees
      )
pivot (avg(tenure) as avg_tenure_d, avg(salary) as avg_salary 
                                                for gender in ('F' as F, 'M' as M))
order by avg_sal_diff desc nulls last, job_id   --  ORDER BY is optional
;

Output:

JOB_ID     F_AVG_LENGTH F_AVG_SALARY M_AVG_LENGTH M_AVG_SALARY AVG_SAL_DIFF
---------- ------------ ------------ ------------ ------------ ------------
SH_CLERK           11.2         3511          9.9         2973          538
ST_MAN             12.3         7467         10.3         7000          467
ST_CLERK           10.5         2883         10.8         2743          140
PU_CLERK           11.6         2833           10         2700          133
AD_VP              11.1        17000         15.8        17000            0
SA_REP             10.3         8244         10.6         8471         -228
SA_MAN             10.3        12000         10.9        12333         -333
IT_PROG            10.2         4500         10.5         6600        -2100
AC_ACCOUNT                                   14.4         8300
AC_MGR             14.4        12008
AD_ASST            13.1         4400
AD_PRES                                      13.4        24000
FI_ACCOUNT                                   11.2         7920
FI_MGR             14.2        12008
HR_REP             14.4         6500
MK_MAN                                       12.7        13000
MK_REP             11.2         6000
PR_REP                                       14.4        10000
PU_MAN                                       13.9        11000

19 rows selected.

Upvotes: 1

I_am_Batman
I_am_Batman

Reputation: 915

Your expected result does not require gender as output column.

So you need to remove it from your select and group by statements :

SELECT job_id, ROUND(AVG(salary),0) as avg_job_salary,
(SELECT ROUND(AVG(salary),0)
FROM hr.employees 
WHERE gender = 'M') AS avg_m_salary, (SELECT ROUND(AVG(salary),0) 
FROM hr.employees 
WHERE gender = 'F') AS avg_f_salary,
ROUND(AVG(days_of_svc/365),1) AS avg_years_svc
FROM (SELECT  job_id, salary, gender, (SYSDATE-hire_date) AS  days_of_svc 
FROM hr.employees)
GROUP BY job_id
ORDER BY job_id;

There are too many dips into the same table in this approach though, in subselect. Let's try to optimize it.

EDIT :

select job_id,ROUND(AVG(salary),0) avg_job_salary,
round(avg(case when gender='M' then Salary end),0) avg_m_salary,
round(avg(case when gender='F' then Salary end),0) avg_f_salary,
round(avg(case when gender='F' then Salary end),0) - round(avg(case when gender='M' then Salary end),0) diff_in_avg
ROUND(AVG((SYSDATE-HIRE_DATE)/365),1) AS avg_years_svc
from hr.employees group by JOB_ID
order by JOB_ID;

I am assuming your formula for calculating avg_years_svc is as expected already, and no bifurcation is required for male and female candidates.

EDIT 2 :

PIVOT function may be able to help. This can help you to bifurcate averages based on male and female count.

Select job_id,avg_m_salary,avg_f_salary, avg_f_salary - avg_m_salary diff_salary from(
select job_id,salary from hr.employees) PIVOT
(avg(salary) for gender in('M' as avg_m_salary,'F' as avg_f_salary));

Upvotes: 1

nilsman
nilsman

Reputation: 396

SELECT Q.*, Q.M_AVG_SAL-Q.F_AVG_SAL DIFFERENCE FROM (
  SELECT JOB_ID
  , SUM(CASE WHEN gender = 'M' THEN SALARY ELSE 0 END)/SUM(CASE WHEN gender = 'M' THEN 1 ELSE 0 END) M_AVG_SAL 
  , ROUND(SUM(CASE WHEN gender = 'M' THEN SVC_LEN ELSE 0 END)/SUM(CASE WHEN gender = 'M' THEN 1 ELSE 0 END), 1) M_AVG_LENGTH
  , SUM(CASE WHEN gender = 'F' THEN SALARY ELSE 0 END)/SUM(CASE WHEN gender = 'F' THEN 1 ELSE 0 END) F_AVG_SAL
  , ROUND(SUM(CASE WHEN gender = 'F' THEN SVC_LEN ELSE 0 END)/SUM(CASE WHEN gender = 'F' THEN 1 ELSE 0 END), 1) F_AVG_LENGTH
  FROM ( 
    SELECT JOB_ID, SALARY
    , MONTHS_BETWEEN(SYSDATE, p.HIRE_DATE)/12 svc_len, GENDER
    FROM EMPLOYEES p 
  )
  GROUP BY JOB_ID
) Q;

Awwww, I forgot about division by zero, so:

SELECT JOB_ID
, DECODE(F_CNT, 0, 0, ROUND(F_LEN/F_CNT, 1)) F_AVG_LENGTH
, DECODE(F_CNT, 0, 0, F_SAL/F_CNT) F_AVG_SAL
, DECODE(M_CNT, 0, 0, ROUND(M_LEN/M_CNT, 1)) M_AVG_LENGTH 
, DECODE(M_CNT, 0, 0, M_SAL/M_CNT) M_AVG_SAL
, DECODE(M_CNT, 0, 0, M_SAL/M_CNT) - DECODE(F_CNT, 0, 0, F_SAL/F_CNT)
DIFFERENCE FROM (
  SELECT JOB_ID
  , SUM(CASE WHEN gender = 'M' THEN SALARY ELSE 0 END) M_SAL
  , SUM(CASE WHEN gender = 'M' THEN 1 ELSE 0 END) M_CNT 
  , SUM(CASE WHEN gender = 'M' THEN SVC_LEN ELSE 0 END) M_LEN
  , SUM(CASE WHEN gender = 'F' THEN SALARY ELSE 0 END) F_SAL
  , SUM(CASE WHEN gender = 'F' THEN 1 ELSE 0 END) F_CNT 
  , SUM(CASE WHEN gender = 'F' THEN SVC_LEN ELSE 0 END) F_LEN 
  FROM ( 
    SELECT JOB_ID, SALARY, GENDER
    , MONTHS_BETWEEN(SYSDATE, p.HIRE_DATE)/12 svc_len
    FROM EMPLOYEES p
  )
  GROUP BY JOB_ID
) Q;

Upvotes: 0

Related Questions