Reputation: 721
I was trying to understand how PostgreSQL use the indices and how their execution plans are.
I executed the below query,
EXPLAIN(VERBOSE,ANALYZE)
SELECT SUM("Orders"."order_subtotal_net_after_discount") AS "Total sum of Order Subtotal Net After Discount"
FROM "ol"."orders_test" AS "Orders"
LEFT OUTER JOIN "ol"."ga_transactions" AS "Ga Transactions" ON "Ga Transactions"."transaction_id" = "Orders"."order_number"
WHERE ( to_char(created_at_order,'YYYY-MM-DD')=to_char(now(),'YYYY-MM-DD')
AND "Orders"."order_state_2"<>'canceled'
AND "Orders"."order_state_2" <> 'pending_payment'
AND "Orders"."order_state_1" <> 'canceled'
AND "Orders"."order_state_1" <> 'pending_payment'
AND "Orders"."order_state_1" <> 'closed'
AND "Orders"."order_state_2" <> 'closed'
)
And got the below execution Plan, I found there is a sequential scan happening with the below filter
Filter: ((("Orders".order_state_2)::text <> 'canceled'::text) AND (("Orders".order_state_2)::text <> 'pending_payment'::text) AND (("Orders".order_state_1)::text <> 'canceled'::text) AND (("Orders".order_state_1)::text <> 'pending_payment'::text) AND (("Orders".order_state_1)::text <> 'closed'::text) AND (("Orders".order_state_2)::text <> 'closed'::text) AND (to_char("Orders".created_at_order, 'YYYY-MM-DD'::text) = to_char(now(), 'YYYY-MM-DD'::text)))"
Whole Query Plan
Aggregate (cost=157385.36..157385.37 rows=1 width=32) (actual time=840.221..840.221 rows=1 loops=1)
Output: sum("Orders".order_subtotal_net_after_discount)
-> Nested Loop Left Join (cost=0.42..157378.56 rows=2717 width=6) (actual time=0.017..840.095 rows=470 loops=1)
Output: "Orders".order_subtotal_net_after_discount
-> Seq Scan on ol.orders_test "Orders" (cost=0.00..148623.91 rows=2717 width=16) (actual time=0.009..838.144 rows=470 loops=1)
Output: "Orders".rno, "Orders".order_id, "Orders".order_number, "Orders".invoice_id, "Orders".invoice_number, "Orders".store_id, "Orders".customer_id, "Orders".real_customer_id, "Orders".shipping_address_id, "Orders".order_state_1, "Orders".order_state_2, "Orders".invoice_state, "Orders".shipping_street, "Orders".shipping_city, "Orders".shipping_postcode, "Orders".shipping_country_id, "Orders".shipping_description, "Orders".coupon_code, "Orders".first_order_of_customer, "Orders".payment_method, "Orders".order_subtotal_net, "Orders".order_subtotal_net_after_discount, "Orders".order_subtotal, "Orders".order_shipment, "Orders".order_shipping_tax, "Orders".order_discount, "Orders".order_tax_total, "Orders".order_grand_total, "Orders".order_total_paid, "Orders".order_refunded_total, "Orders".order_total_open, "Orders".invoice_subtotal_net, "Orders".invoice_subtotal_net_after_discount, "Orders".invoice_subtotal, "Orders".invoice_shipment, "Orders".invoice_shipping_tax, "Orders".invoice_discount, "Orders".invoice_tax_total, "Orders".invoice_grand_total, "Orders".invoice_refunded_total, "Orders".created_at_order, "Orders".created_at_invoice, "Orders".updated_at_invoice, "Orders".customer_email, "Orders".row_number, "Orders".nthorder, "Orders".time_since_last_order, "Orders".time_since_first_order
Filter: ((("Orders".order_state_2)::text <> 'canceled'::text) AND (("Orders".order_state_2)::text <> 'pending_payment'::text) AND (("Orders".order_state_1)::text <> 'canceled'::text) AND (("Orders".order_state_1)::text <> 'pending_payment'::text) AND (("Orders".order_state_1)::text <> 'closed'::text) AND (("Orders".order_state_2)::text <> 'closed'::text) AND (to_char("Orders".created_at_order, 'YYYY-MM-DD'::text) = to_char(now(), 'YYYY-MM-DD'::text)))
Rows Removed by Filter: 654356
-> Index Only Scan using ga_transactions_transaction_id_idx on ol.ga_transactions "Ga Transactions" (cost=0.42..3.21 rows=1 width=10) (actual time=0.004..0.004 rows=0 loops=470)
Output: "Ga Transactions".transaction_id
Index Cond: ("Ga Transactions".transaction_id = ("Orders".order_number)::text)
Heap Fetches: 0
Planning Time: 0.540 ms
Execution Time: 840.255 ms
I created the below index and it seems the query is not using this index,
Do I have to include all the output columns in the INCLUDE list?
If I add more indices it will increase the update time. So is there any best way to identify efficient indices for a table?
CREATE INDEX "IX_states"
ON ol.orders_test USING btree
(order_state_2 ASC NULLS LAST, order_state_1 ASC NULLS LAST, created_at_order ASC NULLS LAST)
INCLUDE(rno, order_id, order_number, invoice_id, invoice_number, store_id, customer_id, real_customer_id, shipping_address_id, invoice_state, shipping_street, shipping_city, shipping_postcode, shipping_country_id, shipping_description, coupon_code, first_order_of_customer, payment_method, order_subtotal_net_after_discount, created_at_invoice, customer_email, nthorder, time_since_last_order, time_since_first_order)
WITH (FILLFACTOR=90)
TABLESPACE pg_default;
Upvotes: 0
Views: 49
Reputation: 31
You made index with created_at_order
as a timestamp, but in WHERE
you are using function conversion and comparing strings.
I would suggest that you edit WHERE
clausule to something like this:
WHERE created_at_order BETWEEN now() AND TIMESTAMP 'today'
Upvotes: 0
Reputation:
No there is no need to include all columns in an index. An index is used to find a few rows quickly. It is no magic bullet that makes everything faster suddenly. If you include all table columns in the index, scanning the index will take just as much time as doing a Seq Scan
You definitely want to index created_at_order
.
create index on orders_test (created_at_order);
However your expression to_char(created_at_order,'YYYY-MM-DD')
will not make use of that index. The best thing is to change your condition:
where created_at_order >= current_date and created_at_order < current_date + 1
That would make use of the above index. If you don't want to use such an expression, then you need a specialized index for the expression. Instead of converting it to a text value, I would index the date value:
create index on orders_test ( (created_at_order::date) );
Indexing the date value also has the advantage that the index is smaller. A date
only takes 4 bytes, whereas the string '2019-10-15'
takes ten bytes of storage.
Then the following would use that index:
where created_at_order::date = current_date
If the additional restrictions on the state columns further reduce the rows, you can use a filtered index for that:
create index on orders_test ( (created_at_order::date) )
WHERE order_state_2 <>'canceled'
AND order_state_2 <> 'pending_payment'
AND order_state_1 <> 'canceled'
AND order_state_1 <> 'pending_payment'
AND order_state_1 <> 'closed'
AND order_state_2 <> 'closed';
This will make the index smaller and finding rows in the index faster. However that index will no longer be used for query that do not include those conditions.
If you don't use all of those conditions all the time, the reduce them to those that you use always.
If those additional conditions don't really reduced the number of rows because the condition on "created today" is already very restrictive, then you can leave them out.
Upvotes: 2