Segolene
Segolene

Reputation: 65

Ordering a MySQL result set by a MAX() value of another table

Let's say I have two tables.

Table 1: Employee roster

| ID | Name        |
| ---| ------------|
|  1 | Joe         |
|  2 | Jim         |
|  3 | Jane        |
|  4 | Jill        |
|  5 | Dilbert     |

Table 2: Monthly paychecks

| Check # | Employee ID | Amount paid |
|---------|-------------|-------------|
| 11235   |      3      |  2000       |
| 51239   |      2      |  3000       |
| 72244   |      5      |  6500       |
| 61633   |      2      |  2300       |
| 14234   |      1      |  2900       |
| 91335   |      1      |  4500       |
| 13736   |      4      |  3000       |
| 41234   |      5      |  5000       |
| 71232   |      4      |  10000      |
| 88234   |      2      |  6000       |
| 23039   |      3      |  1200       |
| 42534   |      2      |  9000       |
| 74834   |      3      |  1230       |
| 38034   |      5      |  9900       |
| 91334   |      2      |  1280       |
| 24134   |      1      |  2000       |

So basically we have 5 employees and a table which keeps track of how much we pay them every month.

I need to make a query that would return all employees ordered by their BEST PAYCHECK EVER.

Like... Jill once had a 10000 paycheck, that makes her #1 on the result set. Dilbert should be #2 as he once got a 9900 paycheck. Etc.

It should be fast enough with tables with 10s of millions of entries

Thanks in advance!

Upvotes: 0

Views: 682

Answers (2)

Uriil
Uriil

Reputation: 12628

As Golez says, with such amount of data, query might be slow.

Maybe you should keep track of max paycheck, not ask db every time?

You can do it from you program, or using triggers. Depends on you.

In this case you will have to execute your query just once, to fill table with initial data, and then it will update automaticly

Upvotes: 0

GolezTrol
GolezTrol

Reputation: 116110

select
  p.ID,
  e.NAME
from
  Paychecks p
  inner join Employee e on p.EmployeeID = e.ID
group by
  p.ID
order by
  max(p.AmountPaid) desc

A different way of writing, which looks more logical, but may be slower (you'd have to test) is:

select
  e.ID,
  e.NAME
from
  Employee e
  inner join Paychecks p on p.EmployeeID = e.ID
group by
  e.ID
order by
  max(p.AmountPaid) desc

With tens of millions of rows, every query is growing slow sometimes, but with the proper indexes, this is as fast as it gets. I think you basically need one index on Paychecks.EmployeeID and Paychecks.AmountPaid combined. And index on Employee.ID may help.

If the join is killing you in the end, you may execute two queries. The first one only uses the paychecks to group them by EmployeeID and order them by the max(PaycheckAmount), and a second one can be used to fetch the names for each ID. Sometimes joins cost more performance than you'd like, and when you got 10 million paychecks for 500 employees, it may be faster to do it in two steps, although it will mean that they have been working at the company for about 1600 years avarage. ;-)

Upvotes: 1

Related Questions