david
david

Reputation: 1077

Any way to improve performance for a query with many joins?

I have the following query that has 6 joins

EXPLAIN ANALYZE 
SELECT
   property.id,
   full_address,
   street_address,
   street.street,
   city.city as city,
   state.state_code as state_code,
   zipcode.zipcode as zipcode,
   property_tax_history.tax AS property_tax,
   property_tax_history.land AS land_value,
   property_tax_history.improvements AS improvements_value,
   property_tax_history.year AS tax_year,
   property_sale_history.date_event AS event_date,
   property_sale_history.event AS event,
   property_sale_history.price AS event_price 
FROM
   property 
   INNER JOIN
      street 
      ON street.id = property.street_id 
   INNER JOIN
      city 
      ON city.id = property.city_id 
   INNER JOIN
      state 
      ON state.id = property.state_id 
   INNER JOIN
      zipcode 
      ON zipcode.id = property.zipcode_id 
   LEFT JOIN
      property_sale_history 
      ON property_sale_history.property_id = property.id 
   LEFT JOIN
      property_tax_history 
      ON property_tax_history.property_id = property.id 
WHERE
   full_address = ?;

Below are the EXPLAIN ANALYZE results

Nested Loop Left Join  (cost=2.83..296.67 rows=6 width=227) (actual time=0.606..0.766 rows=28 loops=1)
  ->  Nested Loop Left Join  (cost=2.26..62.52 rows=1 width=299) (actual time=0.544..0.551 rows=4 loops=1)
        ->  Nested Loop  (cost=1.83..41.98 rows=1 width=284) (actual time=0.522..0.525 rows=1 loops=1)
              ->  Nested Loop  (cost=1.54..33.67 rows=1 width=294) (actual time=0.482..0.485 rows=1 loops=1)
                    ->  Nested Loop  (cost=1.27..25.32 rows=1 width=307) (actual time=0.439..0.441 rows=1 loops=1)
                          ->  Nested Loop  (cost=0.98..17.02 rows=1 width=314) (actual time=0.392..0.394 rows=1 loops=1)
                                ->  Index Scan using property_full_address on property  (cost=0.56..8.57 rows=1 width=318) (actual time=0.339..0.340 rows=1 loops=1)
                                      Index Cond: (full_address = '10951097-4Th-Ave-Chula-Vista-CA-91911'::citext)
                                ->  Index Scan using street_pkey on street  (cost=0.42..8.44 rows=1 width=28) (actual time=0.046..0.046 rows=1 loops=1)
                                      Index Cond: (id = property.street_id)
                          ->  Index Scan using city_id_pk on city  (cost=0.29..8.30 rows=1 width=25) (actual time=0.044..0.044 rows=1 loops=1)
                                Index Cond: (id = property.city_id)
                    ->  Index Scan using state_id_pk on state  (cost=0.28..8.32 rows=1 width=19) (actual time=0.041..0.041 rows=1 loops=1)
                          Index Cond: (id = property.state_id)
              ->  Index Scan using zipcode_id_pk on zipcode  (cost=0.29..8.30 rows=1 width=22) (actual time=0.037..0.037 rows=1 loops=1)
                    Index Cond: (id = property.zipcode_id)
        ->  Index Scan using property_sale_history_property_id on property_sale_history  (cost=0.43..20.50 rows=4 width=31) (actual time=0.019..0.021 rows=4 loops=1)
              Index Cond: (property_id = property.id)
  ->  Index Scan using property_tax_history_property_id on property_tax_history  (cost=0.56..233.54 rows=57 width=33) (actual time=0.014..0.021 rows=7 loops=4)
        Index Cond: (property_id = property.id)
Planning Time: 9.805 ms
Execution Time: 1.628 ms

With so many joins, I can see each one takes some time, and they add up. Is there anyway I can modify the query to speed things up? Would using a SSD drive help?

Upvotes: 0

Views: 52

Answers (1)

Bohemian
Bohemian

Reputation: 425033

The problem is here:

LEFT JOIN property_sale_history ON property_sale_history.property_id = property.id 
LEFT JOIN property_tax_history ON property_tax_history.property_id = property.id

There are independent joins to each table, and columns for both joins are in the result.

This creates an n2 problem with the number of rows returned. eg if there are 3 sale rows and 4 tax rows for a given property, there will be 12 rows for that property in the result. This is going to really hurt performance, and there’s no way to “fix” the performance problem, and such a query doesn’t make sense anyway.

You should use two separate queries - one for sale history and one for tax history.

You may consider creating a view for the common part of the two queries.

——

p.s. easing up on the new lines and left justifying everything after FROM will help readability.

Upvotes: 1

Related Questions