Analysing the PostgreSQL execution Plan

Question

I was trying to understand how PostgreSQL use the indices and how their execution plans are.

I executed the below query,

EXPLAIN(VERBOSE,ANALYZE)
SELECT SUM("Orders"."order_subtotal_net_after_discount") AS "Total sum of Order Subtotal Net After Discount"
FROM "ol"."orders_test" AS "Orders"
LEFT OUTER JOIN "ol"."ga_transactions" AS "Ga Transactions" ON "Ga Transactions"."transaction_id" = "Orders"."order_number"
WHERE ( to_char(created_at_order,'YYYY-MM-DD')=to_char(now(),'YYYY-MM-DD')
       AND "Orders"."order_state_2"<>'canceled'
      AND "Orders"."order_state_2" <> 'pending_payment'
      AND "Orders"."order_state_1" <> 'canceled'
      AND "Orders"."order_state_1" <> 'pending_payment'    
      AND "Orders"."order_state_1" <> 'closed'
      AND "Orders"."order_state_2" <> 'closed'
       )

And got the below execution Plan, I found there is a sequential scan happening with the below filter

Filter: ((("Orders".order_state_2)::text <> 'canceled'::text) AND (("Orders".order_state_2)::text <> 'pending_payment'::text) AND (("Orders".order_state_1)::text <> 'canceled'::text) AND (("Orders".order_state_1)::text <> 'pending_payment'::text) AND (("Orders".order_state_1)::text <> 'closed'::text) AND (("Orders".order_state_2)::text <> 'closed'::text) AND (to_char("Orders".created_at_order, 'YYYY-MM-DD'::text) = to_char(now(), 'YYYY-MM-DD'::text)))"

Whole Query Plan

Aggregate  (cost=157385.36..157385.37 rows=1 width=32) (actual time=840.221..840.221 rows=1 loops=1)
  Output: sum("Orders".order_subtotal_net_after_discount)
  ->  Nested Loop Left Join  (cost=0.42..157378.56 rows=2717 width=6) (actual time=0.017..840.095 rows=470 loops=1)
        Output: "Orders".order_subtotal_net_after_discount
        ->  Seq Scan on ol.orders_test "Orders"  (cost=0.00..148623.91 rows=2717 width=16) (actual time=0.009..838.144 rows=470 loops=1)
              Output: "Orders".rno, "Orders".order_id, "Orders".order_number, "Orders".invoice_id, "Orders".invoice_number, "Orders".store_id, "Orders".customer_id, "Orders".real_customer_id, "Orders".shipping_address_id, "Orders".order_state_1, "Orders".order_state_2, "Orders".invoice_state, "Orders".shipping_street, "Orders".shipping_city, "Orders".shipping_postcode, "Orders".shipping_country_id, "Orders".shipping_description, "Orders".coupon_code, "Orders".first_order_of_customer, "Orders".payment_method, "Orders".order_subtotal_net, "Orders".order_subtotal_net_after_discount, "Orders".order_subtotal, "Orders".order_shipment, "Orders".order_shipping_tax, "Orders".order_discount, "Orders".order_tax_total, "Orders".order_grand_total, "Orders".order_total_paid, "Orders".order_refunded_total, "Orders".order_total_open, "Orders".invoice_subtotal_net, "Orders".invoice_subtotal_net_after_discount, "Orders".invoice_subtotal, "Orders".invoice_shipment, "Orders".invoice_shipping_tax, "Orders".invoice_discount, "Orders".invoice_tax_total, "Orders".invoice_grand_total, "Orders".invoice_refunded_total, "Orders".created_at_order, "Orders".created_at_invoice, "Orders".updated_at_invoice, "Orders".customer_email, "Orders".row_number, "Orders".nthorder, "Orders".time_since_last_order, "Orders".time_since_first_order
              Filter: ((("Orders".order_state_2)::text <> 'canceled'::text) AND (("Orders".order_state_2)::text <> 'pending_payment'::text) AND (("Orders".order_state_1)::text <> 'canceled'::text) AND (("Orders".order_state_1)::text <> 'pending_payment'::text) AND (("Orders".order_state_1)::text <> 'closed'::text) AND (("Orders".order_state_2)::text <> 'closed'::text) AND (to_char("Orders".created_at_order, 'YYYY-MM-DD'::text) = to_char(now(), 'YYYY-MM-DD'::text)))
              Rows Removed by Filter: 654356
        ->  Index Only Scan using ga_transactions_transaction_id_idx on ol.ga_transactions "Ga Transactions"  (cost=0.42..3.21 rows=1 width=10) (actual time=0.004..0.004 rows=0 loops=470)
              Output: "Ga Transactions".transaction_id
              Index Cond: ("Ga Transactions".transaction_id = ("Orders".order_number)::text)
              Heap Fetches: 0
Planning Time: 0.540 ms
Execution Time: 840.255 ms

I created the below index and it seems the query is not using this index,

Do I have to include all the output columns in the INCLUDE list?
If I add more indices it will increase the update time. So is there any best way to identify efficient indices for a table?

CREATE INDEX "IX_states"
    ON ol.orders_test USING btree
    (order_state_2   ASC NULLS LAST, order_state_1   ASC NULLS LAST, created_at_order   ASC NULLS LAST)
    INCLUDE(rno, order_id, order_number, invoice_id, invoice_number, store_id, customer_id, real_customer_id, shipping_address_id, invoice_state, shipping_street, shipping_city, shipping_postcode, shipping_country_id, shipping_description, coupon_code, first_order_of_customer, payment_method, order_subtotal_net_after_discount, created_at_invoice, customer_email, nthorder, time_since_last_order, time_since_first_order)
    WITH (FILLFACTOR=90)
    TABLESPACE pg_default;

user330315 · Accepted Answer

No there is no need to include all columns in an index. An index is used to find a few rows quickly. It is no magic bullet that makes everything faster suddenly. If you include all table columns in the index, scanning the index will take just as much time as doing a Seq Scan

You definitely want to index created_at_order.

create index on orders_test (created_at_order);

However your expression to_char(created_at_order,'YYYY-MM-DD') will not make use of that index. The best thing is to change your condition:

where created_at_order >= current_date and created_at_order < current_date + 1

That would make use of the above index. If you don't want to use such an expression, then you need a specialized index for the expression. Instead of converting it to a text value, I would index the date value:

create index on orders_test ( (created_at_order::date) );

Indexing the date value also has the advantage that the index is smaller. A date only takes 4 bytes, whereas the string '2019-10-15' takes ten bytes of storage.

Then the following would use that index:

where created_at_order::date = current_date

If the additional restrictions on the state columns further reduce the rows, you can use a filtered index for that:

create index on orders_test ( (created_at_order::date) )
WHERE order_state_2 <>'canceled'
  AND order_state_2  <> 'pending_payment'
  AND order_state_1  <> 'canceled'
  AND order_state_1  <> 'pending_payment'    
  AND order_state_1  <> 'closed'
  AND order_state_2  <> 'closed';

This will make the index smaller and finding rows in the index faster. However that index will no longer be used for query that do not include those conditions.

If you don't use all of those conditions all the time, the reduce them to those that you use always.

If those additional conditions don't really reduced the number of rows because the condition on "created today" is already very restrictive, then you can leave them out.

Analysing the PostgreSQL execution Plan

Answers (2)

Related Questions