Reputation: 16120
Let me know if this should be posted on DBA.stackexchange.com instead...
I have the following query:
SELECT DISTINCT "court_cases".*
FROM "court_cases"
LEFT OUTER JOIN service_of_processes
ON service_of_processes.court_case_id = court_cases.id
LEFT OUTER JOIN jobs
ON jobs.service_of_process_id = service_of_processes.id
WHERE
(jobs.account_id = 250093
OR court_cases.account_id = 250093)
ORDER BY
court_cases.court_date DESC NULLS LAST,
court_cases.id DESC
LIMIT 30
OFFSET 0;
But it takes a good 2-4 seconds to run, and in a web application this is unacceptable for a single query.
I ran EXPLAIN (ANALYZE, BUFFERS)
on the query as suggested on the PostgreSQL wiki, and have put the results here: http://explain.depesz.com/s/Yn6
The table definitions for those tables involved in the query is here (including the indexes on foreign key relationships):
http://sqlfiddle.com/#!15/114c6
Is it having issues using the indexes because the WHERE
clause is querying from two different tables? What kind of index or change to the query can I make to make this run faster?
These are the current sizes of the tables in question:
PSQL=# select count(*) from service_of_processes;
count
--------
103787
(1 row)
PSQL=# select count(*) from jobs;
count
--------
108995
(1 row)
PSQL=# select count(*) from court_cases;
count
-------
84410
(1 row)
EDIT: I'm on Postgresql 9.3.1, if that matters.
Upvotes: 4
Views: 2343
Reputation: 1270833
or
clauses can make optimizing a query difficult. One idea is to split the two parts of the query into two separate subqueries. This actually simplifies one of them a lot (the one on court_cases.account_id
).
Try this version:
(SELECT cc.*
FROM "court_cases" cc
WHERE cc.account_id = 250093
ORDER BY cc.court_date DESC NULLS LAST,
cc.id DESC
LIMIT 30
) UNION ALL
(SELECT cc.*
FROM "court_cases" cc LEFT OUTER JOIN
service_of_processes sop
ON sop.court_case_id = cc.id LEFT OUTER JOIN
jobs j
ON j.service_of_process_id = sop.id
WHERE (j.account_id = 250093 AND cc.account_id <> 250093)
ORDER BY cc.court_date DESC NULLS LAST, id DESC
LIMIT 30
)
ORDER BY court_date DESC NULLS LAST,
id DESC
LIMIT 30 OFFSET 0;
And add the following indexes:
create index court_cases_accountid_courtdate_id on court_cases(account_id, court_date, id);
create index jobs_accountid_sop on jobs(account_id, service_of_process_id);
Note that the second query uses and cc.count_id <> 250093
, which prevents duplicate records. This eliminates the need for distinct
or for union
(without union all
).
Upvotes: 3
Reputation: 631
I'll try modifying the query as the following:
SELECT DISTINCT "court_cases".*
FROM "court_cases"
LEFT OUTER JOIN service_of_processes
ON service_of_processes.court_case_id = court_cases.id
LEFT OUTER JOIN jobs
ON jobs.service_of_process_id = service_of_processes.id and jobs.account_id = 250093
WHERE
(court_cases.account_id = 250093)
ORDER BY
court_cases.court_date DESC NULLS LAST,
court_cases.id DESC
LIMIT 30
OFFSET 0;
I think that the issue is in the fact that the where filter is not properly decomposed by query planner optimizer, a really strange performance bug
Upvotes: 0