How to optimize multiply join query?

I've got a query like that

SELECT DISTINCT p.id 
FROM person p 
    INNER JOIN person_func pf1 
        ON p.id = pf1.person_id 
    INNER JOIN Func f1 
        ON f1.id = pf1.func_id
    LEFT JOIN person_location pf2 
        ON p.id = pf2.person_id 
    LEFT JOIN Location f2 
        ON f2.id = pf2.location_id AND f2.val='1'
    LEFT JOIN person_location pf3 
        ON p.id = pf3.person_id 
    LEFT JOIN Location f3 
        ON f3.id = pf3.location_id and f3.val='3'
WHERE f2.val IS NOT NULL OR f3.val IS NOT NULL;

And in common there are 9-10 joins like these. And it runs very-very slow. I've added indexes on person_func(person_id), person_location(person_id) but it didn't help. What can I do to optimise it?

An example - SQLFiddle Example

Upvotes: 0

Answers (2)

Narnian

Reputation: 3908

The WHERE clause actually seems to be redundant. You've already joined based on the values being specific values, then your WHERE clause verifies those values are NOT NULL, so all that does is verify that the join occurred. So, if you are only taking the rows where joins occurred, you can just use an INNER JOIN rather than a LEFT JOIN. ==> f2.val will only be null if the LEFT JOIN did not occur.

LEFT JOIN Location f2 ON f2.id = pf2.location_id AND f2.val='1'

WHERE f2.val IS NOT NULL

So, you could probably just try an inner join in this case.

You could also identify the ID's that correspond to the values you want and join on those instead. Those would be primary keys and could be much more efficient.

Identifying The Real Issue

I would also suggest that you take out each join one by one and rerun the query, noting the time reduction each time. This will hopefully show you which join or where clause element is causing the issue.

Run the query as it is.
Remove first where clause expression.
Remove second where clause expression.
Remove one join
Remove another join...

Upvotes: 0

Erwin Brandstetter

Reputation: 659367

If more than ~ 5% of the table have val IS NOT NULL (which is very likely the case) an index on val will be useless for the WHERE clause.

It might help with the JOIN clause though, where you join on a particular value. However, even better: a multi-column index:

CREATE INDEX some_name_idx ON Location (location_id, val)

Depending on the specific situation (in particular, if val = 1 and val = 3 are rare) , partial indexes may serve you even better:

CREATE INDEX some_name1_idx ON Location (location_id)
WHERE val = 1

CREATE INDEX some_name3_idx ON Location (location_id)
WHERE val = 3

Other than that you need an index on every column that is used in a JOIN or WHERE condition, if the values used are not common-place (less than ~ 5% of the table) - which is true, normally.

All the usual advice for performance optimization applies.

And if you need more specific advice, you need to post a lot mor information. Read the tag wiki of postgresql-performance.

Since you are joining so many tables, you may surpass some limit where Postgres just can't find the best plan any more (too many possible variations). It may become important to write the most selective JOINs first. Generally JOIN comes before LEFT JOIN in this case. Read more in the manual here.

`CROSS JOIN` by proxy

So you have like 10 LEFT JOINs. Example: If half of those have 3 matches, you multiply the row count by 3^5 = 243. Or if all of them have 5 matches, you multiply by 5^10 = 9765625. This has to result in terrible performance. And all for nothing, since you only want DISTINCT id in the end.

The icing on the cake: to get DISTINCT person.id, all those LEFT JOINs are 100 % useless. They don't change a thing. Just remove them all.

As for JOIN: Replace those with EXISTS to avoid multiplying your rows. Like:

EXISTS (SELECT 1 FROM Func f1 WHERE f1.id = pf1.func_id)

Upvotes: 1

How to optimize multiply join query?

Answers (2)

CROSS JOIN by proxy

Related Questions

`CROSS JOIN` by proxy