Reputation: 103
I've got a query like that
SELECT DISTINCT p.id
FROM person p
INNER JOIN person_func pf1
ON p.id = pf1.person_id
INNER JOIN Func f1
ON f1.id = pf1.func_id
LEFT JOIN person_location pf2
ON p.id = pf2.person_id
LEFT JOIN Location f2
ON f2.id = pf2.location_id AND f2.val='1'
LEFT JOIN person_location pf3
ON p.id = pf3.person_id
LEFT JOIN Location f3
ON f3.id = pf3.location_id and f3.val='3'
WHERE f2.val IS NOT NULL OR f3.val IS NOT NULL;
And in common there are 9-10 joins like these. And it runs very-very slow. I've added indexes on person_func(person_id), person_location(person_id) but it didn't help. What can I do to optimise it?
An example - SQLFiddle Example
Upvotes: 0
Views: 618
Reputation: 3908
The WHERE clause actually seems to be redundant. You've already joined based on the values being specific values, then your WHERE clause verifies those values are NOT NULL, so all that does is verify that the join occurred. So, if you are only taking the rows where joins occurred, you can just use an INNER JOIN rather than a LEFT JOIN. ==> f2.val will only be null if the LEFT JOIN did not occur.
LEFT JOIN Location f2 ON f2.id = pf2.location_id AND f2.val='1'
WHERE f2.val IS NOT NULL
So, you could probably just try an inner join in this case.
You could also identify the ID's that correspond to the values you want and join on those instead. Those would be primary keys and could be much more efficient.
Identifying The Real Issue
I would also suggest that you take out each join one by one and rerun the query, noting the time reduction each time. This will hopefully show you which join or where clause element is causing the issue.
Upvotes: 0
Reputation: 659367
If more than ~ 5% of the table have val IS NOT NULL
(which is very likely the case) an index on val
will be useless for the WHERE
clause.
It might help with the JOIN clause though, where you join on a particular value. However, even better: a multi-column index:
CREATE INDEX some_name_idx ON Location (location_id, val)
Depending on the specific situation (in particular, if val = 1
and val = 3
are rare) , partial indexes may serve you even better:
CREATE INDEX some_name1_idx ON Location (location_id)
WHERE val = 1
CREATE INDEX some_name3_idx ON Location (location_id)
WHERE val = 3
Other than that you need an index on every column that is used in a JOIN or WHERE condition, if the values used are not common-place (less than ~ 5% of the table) - which is true, normally.
All the usual advice for performance optimization applies.
And if you need more specific advice, you need to post a lot mor information. Read the tag wiki of postgresql-performance.
Since you are joining so many tables, you may surpass some limit where Postgres just can't find the best plan any more (too many possible variations). It may become important to write the most selective JOINs first. Generally JOIN
comes before LEFT JOIN
in this case. Read more in the manual here.
CROSS JOIN
by proxySo you have like 10 LEFT JOINs. Example: If half of those have 3 matches, you multiply the row count by 3^5 = 243. Or if all of them have 5 matches, you multiply by 5^10 = 9765625. This has to result in terrible performance. And all for nothing, since you only want DISTINCT id
in the end.
The icing on the cake: to get DISTINCT person.id
, all those LEFT JOINs are 100 % useless. They don't change a thing. Just remove them all.
As for JOIN
: Replace those with EXISTS
to avoid multiplying your rows. Like:
EXISTS (SELECT 1 FROM Func f1 WHERE f1.id = pf1.func_id)
Upvotes: 1