Reputation: 275
Simplified, I got the following situation. I've got two tables. One migration has multiple checks through checks.migration_id
. The Column checks.old
describes a type of check. Now I want to get for each migration the check with the biggest time where old
is true (query1) and false (query2).
There are about 30.000 migrations and each has around 1000 checks where old=true and 1000 checks where old=false. The table checks will grow quite extreme. The order of the checks is not given and could be totally mixed up.
I want to get the latest check for a maximum of 150 migrations at once.
SQL Fiddle: http://sqlfiddle.com/#!15/282ce/15
I'm using PostgreSQL 9.3 and Rails 3.2 (shouldn't matter)
Whats the most efficient way to get the latest subrecord where old = true?
Table Migrations:
| ID |
|----|
| 1 |
| 2 |
Table Checks:
| ID | MIGRATION_ID | OLD | OK | TIME |
|----|--------------|-----|----|----------------------------------|
| 1 | 1 | 1 | 1 | September, 22 2014 12:00:01+0000 |
| 2 | 1 | 0 | 1 | September, 22 2014 12:00:02+0000 |
| 3 | 2 | 1 | 1 | September, 22 2014 12:00:01+0000 |
| 4 | 2 | 0 | 1 | September, 22 2014 12:00:02+0000 |
| 5 | 1 | 1 | 1 | September, 22 2014 12:00:03+0000 |
| 6 | 1 | 0 | 1 | September, 22 2014 12:00:04+0000 |
| 7 | 2 | 1 | 1 | September, 22 2014 12:00:03+0000 |
| 8 | 2 | 0 | 1 | September, 22 2014 12:00:04+0000 |
Query 1 should return the following result:
| Migration.id | Check_ID | OLD | OK | TIME |
|--------------|----------|-----|----|----------------------------------|
| 1 | 5 | 1 | 1 | September, 22 2014 12:00:03+0000 |
| 2 | 7 | 1 | 1 | September, 22 2014 12:00:03+0000 |
Query 1 should return the following result:
| Migration.id | Check_ID | OLD | OK | TIME |
|--------------|----------|-----|----|----------------------------------|
| 1 | 6 | 0 | 1 | September, 22 2014 12:00:04+0000 |
| 2 | 8 | 0 | 1 | September, 22 2014 12:00:04+0000 |
I tried to solve it with a max in a subquery, but then I lose the information about checks.ok
and check.time.
SELECT eq.id, (SELECT max(checks.id) FROM checks WHERE checks.migration_id = eq.id and checks.old = 't') AS latest FROM migrations eq;
SELECT eq.id, (SELECT max(checks.id) FROM checks WHERE checks.migration_id = eq.id and checks.old = 'f') AS latest FROM migrations eq;
(I know that I get max(id)
instead of max(time)
.)
In Rails I tried to fetch for each Migration the latest Record which resulted in the 1+n Problem. I'm not able to include all Checks because there are way to much of them.
Upvotes: 3
Views: 333
Reputation: 658482
A simple solution with the Postgres specific DISTINCT ON
:
Query 1 ("for each migration the check with the biggest time
where old
is true"):
SELECT DISTINCT ON (migration_id)
migration_id, id AS check_id, old, ok, time
FROM checks
WHERE old
ORDER BY migration_id, time DESC;
Invert the the WHERE
condition for Query 2:
...
WHERE NOT old
...
Details:
But if you want better read performance with big tables, use JOIN LATERAL
(Postgres 9.2+, standard SQL), building on a multicolumn index like:
CREATE INDEX checks_special_idx ON checks(old, migration_id, time DESC);
Query 1:
SELECT m.id AS migration_id
, c.id AS check_id, c.old, c.ok, c.time
FROM migrations m
-- FROM (SELECT id FROM migrations LIMIT 150) m
JOIN LATERAL (
SELECT id, old, ok, time
FROM checks
WHERE migration_id = m.id
AND old
ORDER BY time DESC
LIMIT 1
) c ON TRUE;
Switch the condition on old
again for query 2.
For an unspecified "maximum of 150 migrations", use the commented alternative line.
Details:
Aside: don't use "time" as identifier. It's a reserved word in standard SQL and a basic type name in Postgres.
Upvotes: 1