Minli Xu
Minli Xu

Reputation: 33

rails. select records with max()

I have a table like this, and I want to return the top two person (by name) with the highest salary, and also the record with the corresponding salary. Here is the table

id, name, salary
1, Tom, 200
2, Tom, 300
3, Bob, 400
4, Bob, 500
5, Alice, 600
6, Alice, 700

I used this command

Employer.select("employers.*, max(employers.salary) as maxsalary").group("employers.name").order("maxsalary desc").limit(2)

Desired return:

id, name, salary
6, Alice, 700
4, Bob, 500

What I got seems to be like this:

id, name, salary
5, Alice, 600
3, Bob, 400

Is there anyway to select the records responding to the max ? Any comment/answer is much appreciated.

Upvotes: 0

Views: 1978

Answers (1)

khaled_gomaa
khaled_gomaa

Reputation: 3412

This Question is very tricky actually! It seems very easy but it's not.

The Query:

Employer.joins(%Q|
  LEFT JOIN employers as e 
  ON 
  e.name = employers.name 
  AND 
  employers.salary < e.salary
|).where('e.salary IS NULL').order('employers.salary DESC').limit(2)

HOW DOES THAT WORK! (I've been there)

We want to make sure that we only have the highest salary for each employer and then get the highest 2 of those.


Some Theoretical Stuff (skip this part if you only want to understand the query)

Let Salary be a function S(name,id) where it returns a value given the name and id To prove that the given salary (S(name,id)) is the highest we have to prove that We want to prove either

  • ∀x S(name,id) > S(name,x) (this salary is higher than all other salaries for that name)

OR

  • ¬∃x S(name, id) < S(name, x) (there exists no higher salary for that name)

The first approach will need us to get all the records for that name which I do not really like.

The second one will need a smart way to say there can be no record higher than this one.


Back to SQL

If we left joins the table on the name and salary being less than the joined table:

%Q|
      LEFT JOIN employers as e 
      ON 
      e.name = employers.name 
      AND 
      employers.salary < e.salary
    |

we make sure that all records that has another record with higher salary for the same user to be joined:

employers.id, employers.name, employers.salary, e.id, e.name, e.salary
1           , Tom           , 200             , 2   , Tom   , 300
2           , Tom           , 300
3           , Bob           , 400             , 4   , Bob   , 500
4           , Bob           , 500
5           , Alice         , 600             , 6   , Alice   , 700
6           , Alice         , 700

That will help us filter for the highest salary for each employer with no grouping needed:

where('e.salary IS NULL')

employers.id, employers.name, employers.salary, e.id, e.name, e.salary
2           , Tom           , 300
4           , Bob           , 500
6           , Alice         , 700

Now all we need to do is sort:

order('employers.salary DESC')

employers.id, employers.name, employers.salary, e.id, e.name, e.salary
6           , Alice         , 700
4           , Bob           , 500
2           , Tom           , 300

Then limit

limit(2)

employers.id, employers.name, employers.salary, e.id, e.name, e.salary
6           , Alice         , 700
4           , Bob           , 500

And that's the answer we need.


Why don't we

1.

Employer.order('salary desc').limit(2)

Because this will get us the records with the highest salaries independent of the name

employers.id, employers.name, employers.salary
5           , Alice         , 600
6           , Alice         , 700

2.

Employer.select('DISTINCT(name)').order('salary desc').limit(2)

The problem with that is it will only keep the first appearance of the name then sort

employers.id, employers.name, employers.salary
1           , Tom           , 200
3           , Bob           , 400
5           , Alice         , 600

Upvotes: 4

Related Questions