Reputation: 65
Let's say I have two tables.
Table 1: Employee roster
| ID | Name |
| ---| ------------|
| 1 | Joe |
| 2 | Jim |
| 3 | Jane |
| 4 | Jill |
| 5 | Dilbert |
Table 2: Monthly paychecks
| Check # | Employee ID | Amount paid |
|---------|-------------|-------------|
| 11235 | 3 | 2000 |
| 51239 | 2 | 3000 |
| 72244 | 5 | 6500 |
| 61633 | 2 | 2300 |
| 14234 | 1 | 2900 |
| 91335 | 1 | 4500 |
| 13736 | 4 | 3000 |
| 41234 | 5 | 5000 |
| 71232 | 4 | 10000 |
| 88234 | 2 | 6000 |
| 23039 | 3 | 1200 |
| 42534 | 2 | 9000 |
| 74834 | 3 | 1230 |
| 38034 | 5 | 9900 |
| 91334 | 2 | 1280 |
| 24134 | 1 | 2000 |
So basically we have 5 employees and a table which keeps track of how much we pay them every month.
I need to make a query that would return all employees ordered by their BEST PAYCHECK EVER.
Like... Jill once had a 10000 paycheck, that makes her #1 on the result set. Dilbert should be #2 as he once got a 9900 paycheck. Etc.
It should be fast enough with tables with 10s of millions of entries
Thanks in advance!
Upvotes: 0
Views: 682
Reputation: 12628
As Golez says, with such amount of data, query might be slow.
Maybe you should keep track of max paycheck, not ask db every time?
You can do it from you program, or using triggers. Depends on you.
In this case you will have to execute your query just once, to fill table with initial data, and then it will update automaticly
Upvotes: 0
Reputation: 116110
select
p.ID,
e.NAME
from
Paychecks p
inner join Employee e on p.EmployeeID = e.ID
group by
p.ID
order by
max(p.AmountPaid) desc
A different way of writing, which looks more logical, but may be slower (you'd have to test) is:
select
e.ID,
e.NAME
from
Employee e
inner join Paychecks p on p.EmployeeID = e.ID
group by
e.ID
order by
max(p.AmountPaid) desc
With tens of millions of rows, every query is growing slow sometimes, but with the proper indexes, this is as fast as it gets. I think you basically need one index on Paychecks.EmployeeID and Paychecks.AmountPaid combined. And index on Employee.ID may help.
If the join is killing you in the end, you may execute two queries. The first one only uses the paychecks to group them by EmployeeID and order them by the max(PaycheckAmount), and a second one can be used to fetch the names for each ID. Sometimes joins cost more performance than you'd like, and when you got 10 million paychecks for 500 employees, it may be faster to do it in two steps, although it will mean that they have been working at the company for about 1600 years avarage. ;-)
Upvotes: 1