Reputation: 53

How can I get the row of the max column and the second highest value as the last column?

I have a dataset where each row is a salary change, and I am trying to get the last salary (that is the highest), with all the details, as the reason (if the salary has changed) and the date of last change, and as the last column I want the value of the salary before. How can I get it?

I've done with the last salary and the first salary using MIN(), but I want the salary before, not the first. The result that I am getting:

company_id   name   last_salary_change   reason   updated_at   salary_before

29   Fulano   5000  promotion   2019-05-20   1200
29   Ramon   25000  adjustment   2019-03-23   11500

The query that I've used:

SELECT p.company_id,
       u.name AS name,
       MAX(psc.amount/100) AS last_salary,
       MAX(psc.reason) AS reason,
       MAX(psc.updated_at) AS updated,
       MIN(psc.amount/100) AS first_salary
FROM lukla.profiles AS p

INNER JOIN lukla.profile_salary_changes AS psc
  ON p.id = psc.profile_id
INNER JOIN lukla.users AS u
  ON p.id = u.profile_id

WHERE p.company_id = 29 (filtered by a specific company)

GROUP BY 1, 2

What I am looking for:

company_id   name   last_salary_change   reason   updated_at   salary_before

29   Fulano   5000  promotion   2019-05-20   3500
29   Ramon   25000  adjustment   2019-03-23   24000

Upvotes: 3

Answers (5)

diequeiroz

Reputation: 1

Assuming your database structure looks like this: https://www.db-fiddle.com/f/i2BNYKhSaiu1xGPDPfeydr/1

You can run this query:

SELECT p.company_id,
    p.id,
    last_salary.amount/100 AS last_salary,
    last_salary.reason AS reason,
    last_salary.updated_at AS updated,
    prev_salary.updated_at as salary_before_updated_at,
    prev_salary.amount/100 AS salary_before
FROM profiles AS p
LEFT JOIN profile_salary_changes AS last_salary
    ON p.id = last_salary.profile_id 
        AND NOT EXISTS (
            SELECT * FROM profile_salary_changes psc2 
                WHERE psc2.profile_id = last_salary.profile_id 
                    AND psc2.updated_at > last_salary.updated_at)
LEFT JOIN profile_salary_changes AS prev_salary
    ON p.id = prev_salary.profile_id 
        AND prev_salary.updated_at != last_salary.updated_at 
        AND NOT EXISTS (
            SELECT * FROM profile_salary_changes psc3 
                WHERE psc3.profile_id = prev_salary.profile_id 
                    AND psc3.updated_at < last_salary.updated_at 
                    AND psc3.updated_at > prev_salary.updated_at)
WHERE p.company_id = 29;

If you have a primary field like id in your profile_salary_changes table, it's better to use it instead of updated_at in comparisons

Upvotes: 0

Belayer

Reputation: 14932

IMHO There are major issues with the entire concept. They stem from the assumptions that the first salary is always the lowest, and the last is the high test, and that salaries are always upward bound. None of these are true. The assumption that MAX can be used to get the latest also leads to a structure that would be rare at best. The section:

MAX(psc.amount/100) AS last_salary,
MAX(psc.reason) AS reason,
MAX(psc.updated_at) AS updated,

Even admitting that MAX psc.amount would get the correct value, we can assume the date is also selected correctly. However, the same does not apply for reason, care must always be taken getting the max of a text in combination with another column. There is no requirement that max text corresponds to max amount.

This was an interesting problem to play with so the following resolves these issues. Take it or ignore it as you see fit. Either way I found it interesting.

--- create "tables" as cte
with profiles (id, company_id) as
     ( values (61,29)
            , (62,29)
            , (63,29)
            , (64,29)
     )
   , profile_salary_changes (profile_id, amount, reason, updated_at) as
     ( values (64, 640000, 'Initial Hire', '2018-02-15'::date)
            , (64, 611200, 'Salary cut 4.5% across the board Co wide.', '2018-07-05'::date)
            , (64, 625600, '50% July''s cut recovery', '2018-12-09'::date)
            , (64, 710000, 'Promotion', '2019-02-09'::date)            
            , (63, 630000, 'Initial  ', '2019-02-15'::date)
            , (63, 600000, 'Transfer.', '2019-07-05'::date)
            , (63, 627500, 'COL', '2019-12-09'::date)
            , (61, 100000, 'Initial Only', '2019-05-09'::date)  
            , (62, 620000, 'First', '2019-03-09'::date)  
            , (62, 625000, 'Bonus', '2019-08-09'::date)        
     )
   , users (profile_id, name) as
     ( values (61, 'Test1')
            , (62, 'Test2')
            , (63, 'Test3')
            , (64, 'Test4')
     )
-- final selection pick up designated columns 
select company_id
     , name 
     , reason
     , trunc(amount/100.0,2)        last_salary
     , updated_at  
     , trunc(prev_sal/100.0,2)      prev_salary
     , trunc(first_salary/100.0,2)  first_salary
  from ( -- pick up the appropriate previous and first salary
         select x.*
              , lag (amount)                  over( partition by company_id, profile_id order by updated_at ) prev_sal
              , lag (amount, (rn-1)::integer) over( partition by company_id, profile_id order by updated_at ) first_salary      
           from (
                 -- gather all columns, number each row for company and profile, and get number of total rows for each set 
                 select p.company_id
                      , u.name
                      , psc.*  
                      , row_number() over( partition by p.company_id, psc.profile_id order by updated_at) rn
                      , count(*)     over( partition by p.company_id, psc.profile_id) r
                   from profile_salary_changes psc
                   join profiles               p on (p.id = psc.profile_id)
                   join users                  u on (p.id = u.profile_id)
                ) x
         ) z
 -- select only the last row in each set. Additional salary values have been attached 
 where r = rn 
 order by name;

Upvotes: 0

Eduardo Kaneko

Reputation: 53

The answer is based on the @Gordon's answer, the only diff is that I've added

      JOIN lukla.profiles p
      ON p.id = psc.profile_id 
      JOIN lukla.users u
      ON p.id = u.profile_id

inside the first JOIN.

SELECT p.company_id,
       u.name AS name,
       MAX(psc.amount/100) AS last_salary,
       MAX(psc.reason) AS reason,
       MAX(psc.updated_at) AS updated,
       MIN(psc.amount/100) AS first_salary,
       MAX(amount / 100) FILTER (WHERE seqnum = 2) as prev_salary
FROM lukla.profiles p 

JOIN
    (SELECT psc.*,
             ROW_NUMBER() OVER (PARTITION BY p.company_id, u.name ORDER BY psc.updated_at) as seqnum
      FROM lukla.profile_salary_changes psc
      JOIN lukla.profiles p
      ON p.id = psc.profile_id 
      JOIN lukla.users u
      ON p.id = u.profile_id

     ) psc
  ON p.id = psc.profile_id 
JOIN lukla.users u
  ON p.id = u.profile_id

WHERE p.company_id = 29
GROUP BY 1, 2;

Upvotes: 1

Gordon Linoff

Reputation: 1271231

You can use window functions with conditional aggregation:

SELECT p.company_id,
       u.name AS name,
       MAX(psc.amount/100) AS last_salary,
       MAX(psc.reason) AS reason,
       MAX(psc.updated_at) AS updated,
       MIN(psc.amount/100) AS first_salary,
       MAX(amount / 100) FILTER (WHERE seqnum = 2) as prev_salary
FROM lukla.profiles p INNER JOIN
     (SELECT ps.*,
             ROW_NUMBER() OVER (PARTITION BY p.company_id, u.name ORDER BY psc.updated_at) as seqnum
      FROM lukla.profile_salary_changes psc
     ) psc
     ON p.id = psc.profile_id INNER JOIN
     lukla.users u
     ON p.id = u.profile_id
WHERE p.company_id = 29 (filtered by a specific company)
GROUP BY 1, 2;

Two comments:

You are making the assumption that the first salary is the lowest. That is not always true.
This assumes that the salary change table really does have only salary changes and the other JOINs are not bringing in new rows.

Upvotes: 0

Ankit Bajpai

Reputation: 13527

It seems you need second maximum salary for that column. You can try below query -

SELECT p.company_id,
       u.name AS name,
       MAX(psc.amount/100) AS last_salary,
       MAX(psc.reason) AS reason,
       MAX(psc.updated_at) AS updated,
       (SELECT MAX(psc.amount/100)
        FROM lukla.profile_salary_changes psc2
        WHERE psc2.profile_id = profile_salary_changes.profile_id
        AND (psc.amount/100) < (SELECT MAX(psc.amount/100)
                                FROM lukla.profile_salary_changes psc2
                                WHERE psc2.profile_id = profile_salary_changes.profile_id)) salary_before
FROM lukla.profiles AS p
INNER JOIN lukla.profile_salary_changes AS psc
      ON p.id = psc.profile_id
INNER JOIN lukla.users AS u
ON p.id = u.profile_id
WHERE p.company_id = 29 (filtered by a specific company)
GROUP BY 1, 2

Though I have not tested it but it should work.

Upvotes: 0

How can I get the row of the max column and the second highest value as the last column?

Answers (5)

Related Questions