Rehna
Rehna

Reputation: 95

Optimizing postgres query

                                 QUERY PLAN                                   
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Unique  (cost=32164.87..32164.89 rows=1 width=44) (actual time=221552.831..221552.831 rows=0 loops=1)
   ->  Sort  (cost=32164.87..32164.87 rows=1 width=44) (actual time=221552.827..221552.827 rows=0 loops=1)
         Sort Key: t.date_effective, t.acct_account_transaction_id, p.method, t.amount, c.business_name, t.amount
         ->  Nested Loop  (cost=22871.67..32164.86 rows=1 width=44) (actual time=221552.808..221552.808 rows=0 loops=1)
               ->  Nested Loop  (cost=22871.67..32160.37 rows=1 width=52) (actual time=221431.071..221546.619 rows=670 loops=1)
                     ->  Nested Loop  (cost=22871.67..32157.33 rows=1 width=43) (actual time=221421.218..221525.056 rows=2571 loops=1)
                           ->  Hash Join  (cost=22871.67..32152.80 rows=1 width=16) (actual time=221307.382..221491.019 rows=2593 loops=1)
                                 Hash Cond: ("outer".acct_account_id = "inner".acct_account_fk)
                                 ->  Seq Scan on acct_account a  (cost=0.00..7456.08 rows=365008 width=8) (actual time=0.032..118.369 rows=61295 loops=1)
                                 ->  Hash  (cost=22871.67..22871.67 rows=1 width=16) (actual time=221286.733..221286.733 rows=2593 loops=1)
                                       ->  Nested Loop Left Join  (cost=0.00..22871.67 rows=1 width=16) (actual time=1025.396..221266.357 rows=2593 loops=1)
                                             Join Filter: ("inner".orig_acct_payment_fk = "outer".acct_account_transaction_id)
                                             Filter: ("inner".link_type IS NULL)
                                             ->  Seq Scan on acct_account_transaction t  (cost=0.00..18222.98 rows=1 width=16) (actual time=949.081..976.432 rows=2596 loops=1)
                                                   Filter: ((("type")::text = 'debit'::text) AND ((transaction_status)::text = 'active'::text) AND (date_effective >= '2012-03-01'::date) AND (date_effective < '2012-04-01 00:00:00'::timestamp without time zone))
                                             ->  Seq Scan on acct_payment_link l  (cost=0.00..4648.68 rows=1 width=15) (actual time=1.073..84.610 rows=169 loops=2596)
                                                   Filter: ((link_type)::text ~~ 'return_%'::text)
                           ->  Index Scan using contact_pk on contact c  (cost=0.00..4.52 rows=1 width=27) (actual time=0.007..0.008 rows=1 loops=2593)
                                 Index Cond: (c.contact_id = "outer".contact_fk)
                     ->  Index Scan using acct_payment_transaction_fk on acct_payment p  (cost=0.00..3.02 rows=1 width=13) (actual time=0.005..0.005 rows=0 loops=2571)
                           Index Cond: (p.acct_account_transaction_fk = "outer".acct_account_transaction_id)
                           Filter: ((method)::text <> 'trade'::text)
               ->  Index Scan using contact_role_pk on contact_role  (cost=0.00..4.48 rows=1 width=4) (actual time=0.007..0.007 rows=0 loops=670)
                     Index Cond: ("outer".contact_id = contact_role.contact_fk)
                     Filter: (exchange_fk = 74)
Total runtime: 221553.019 ms

Upvotes: 3

Views: 4211

Answers (4)

Erwin Brandstetter
Erwin Brandstetter

Reputation: 656724

Your statement rewritten and formatted:

SELECT DISTINCT
       t.date_effective,
       t.acct_account_transaction_id,
       p.method,
       t.amount,
       c.business_name,
       t.amount
FROM   contact                  c
JOIN   contact_role            cr ON cr.contact_fk = c.contact_id
JOIN   acct_account             a ON a.contact_fk = c.contact_id 
JOIN   acct_account_transaction t ON t.acct_account_fk = a.acct_account_id 
JOIN   acct_payment             p ON p.acct_account_transaction_fk
                                   = t.acct_account_transaction_id
LEFT   JOIN acct_payment_link   l ON orig_acct_payment_fk
                                   = acct_account_transaction_id
                                        -- missing table-qualification!
                                 AND link_type like 'return_%'
                                        -- missing table-qualification!
WHERE  transaction_status = 'active'    -- missing table-qualification!
AND    cr.exchange_fk = 74
AND    t.type = 'debit'
AND    t.date_effective >= '2012-03-01'
AND    t.date_effective <  (date '2012-03-01' + interval '1 month')
AND    p.method != 'trade'
AND    l.link_type IS NULL
ORDER  BY t.date_effective DESC;
  • Explicit JOIN statements are preferable. I reordered your tables according to your JOIN logic.

  • Why (date '2012-03-01' + interval '1 month') instead of date '2012-04-01'?

  • Some table qualifications are missing. In a complex statement like this that's bad style. May be hiding a mistake.

The key to performance are indexes where appropriate, proper configuration of PostgreSQL and accurate statistics.

General advice on performance tuning in the PostgreSQL wiki.

Upvotes: 0

maniek
maniek

Reputation: 7307

Your problem is here:

->  Nested Loop Left Join  (cost=0.00..22871.67 rows=1 width=16) (actual time=1025.396..221266.357 rows=2593 loops=1)
    Join Filter: ("inner".orig_acct_payment_fk = "outer".acct_account_transaction_id)
    Filter: ("inner".link_type IS NULL)
        ->  Seq Scan on acct_account_transaction t  (cost=0.00..18222.98 rows=1 width=16) (actual time=949.081..976.432 rows=2596 loops=1)
                Filter: ((("type")::text = 'debit'::text) AND ((transaction_status)::text = 'active'::text) AND (date_effective >= '2012-03-01'::date) AND (date_effective   
            Seq Scan on acct_payment_link l  (cost=0.00..4648.68 rows=1 width=15) (actual time=1.073..84.610 rows=169 loops=2596)
                Filter: ((link_type)::text ~~ 'return_%'::text)

It expects to find 1 row in acct_account_transaction, while it finds 2596, and similarly for the other table.

You did not mention Your postgres version (could You?), but this should do the trick:

SELECT DISTINCT
    t.date_effective,
    t.acct_account_transaction_id,
    p.method,
    t.amount,
    c.business_name,
    t.amount
FROM
    contact c inner join contact_role on (c.contact_id=contact_role.contact_fk and contact_role.exchange_fk=74),
    acct_account a, acct_payment p,
    acct_account_transaction t
WHERE
    p.acct_account_transaction_fk=t.acct_account_transaction_id
    and t.type = 'debit'
    and transaction_status = 'active'
    and p.method != 'trade'
    and t.date_effective >= '2012-03-01'
    and t.date_effective < (date '2012-03-01' + interval '1 month')
    and c.contact_id=a.contact_fk and a.acct_account_id = t.acct_account_fk
    and not exists(
         select * from acct_payment_link l 
           where orig_acct_payment_fk == acct_account_transaction_id 
           and link_type like 'return_%'
    )
ORDER BY
    t.date_effective DESC

Also, try setting appropriate statistics target for relevant columns. Link to the friendly manual: http://www.postgresql.org/docs/current/static/sql-altertable.html

Upvotes: 4

vyegorov
vyegorov

Reputation: 22885

I remove my first suggestion, as it changes the nature of the query.

I see that there's too much time spent in the LEFT JOIN.

  1. First thing is to try to make only a single scan of the acct_payment_link table. Could you try rewriting your query to:

    ... LEFT JOIN (SELECT * FROM acct_payment_link
                   WHERE link_type LIKE 'return_%') AS l ...
    
  2. You should check your statistics, as there's difference between planned and returned numbers of rows.

  3. You haven't included tables' and indexes' definitions, it'd be good to take a look on those.

  4. You might also want to use contrib/pg_tgrm extension to build index on the acct_payment_link.link_type, but I would make this a last option to try out.

BTW, what is the PostgreSQL version you're using?

Upvotes: 0

Edmund
Edmund

Reputation: 10819

What are your indexes, and have you analysed lately? It's doing a table scan on acct_account_transaction even though there are several criteria on that table:

  • type
  • date_effective

If there are no indexes on those columns, then a compound one one (type, date_effective) could help (assuming there are lots of rows that don't meet the criteria on those columns).

Upvotes: 0

Related Questions