How can I make this query run faster in postgres

Question

I have this query which takes 86 sec to execute.

select cust_id customer_id,
       cust_first_name customer_first_name,
       cust_last_name customer_last_name,
       cust_prf customer_prf,
       cust_birth_country customer_birth_country,
       cust_login customer_login,
       cust_email_address customer_email_address,
       date_year ddyear,
       sum(((stock_ls_price-stock_ws_price-stock_ds_price)+stock_es_price)/2) total_yr,
       's' stock_type
 from customer, stock, date
 where customer_k = stock_customer_k
   and stock_soldate_k = date_k
 group by cust_id, cust_first_name, cust_last_name, cust_prf, cust_birth_country, cust_login, cust_email_address, date_year;

EXPLAIN ANALYZE RESULT:

QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
GroupAggregate  (cost=639753.55..764040.06 rows=2616558 width=213) (actual time=81192.575..86536.398 rows=190581 loops=1)
   Group Key: customer.cust_id, customer.cust_first_name, customer.cust_last_name, customer.cust_prf, customer.cust_birth_country, customer.cust_login, customer.cust_email_address, date.date_year
   ->  Sort  (cost=639753.55..646294.95 rows=2616558 width=213) (actual time=81192.468..83977.960 rows=2685453 loops=1)
         Sort Key: customer.cust_id, customer.cust_first_name, customer.cust_last_name, customer.cust_prf, customer.cust_birth_country, customer.cust_login, customer.cust_email_address, date.date_year
         Sort Method: external merge  Disk: 460920kB
         ->  Hash Join  (cost=6527.66..203691.58 rows=2616558 width=213) (actual time=60.500..2306.082 rows=2685453 loops=1)
               Hash Cond: (stock.stock_customer_k = customer.customer_k)
               ->  Merge Join  (cost=1423.66..144975.59 rows=2744641 width=30) (actual time=8.820..1412.109 rows=2750311 loops=1)
                     Merge Cond: (date.date_k = stock.stock_soldate_k)
                     ->  Index Scan using date_key_idx on date (cost=0.29..2723.33 rows=73049 width=8) (actual time=0.013..7.164 rows=37622 loops=1)
                     ->  Index Scan using stock_soldate_k_index on stock  (cost=0.43..108829.12 rows=2880404 width=30) (actual time=0.004..735.043 rows=2750312 loops=1)
               ->  Hash  (cost=3854.00..3854.00 rows=100000 width=191) (actual time=51.650..51.650rows=100000 loops=1)
                     Buckets: 16384  Batches: 1  Memory Usage: 16139kB
                     ->  Seq Scan on customer  (cost=0.00..3854.00 rows=100000 width=191) (actual time=0.004..30.341 rows=100000 loops=1)
 Planning time: 1.761 ms
 Execution time: 86621.807 ms

I have work_mem=512MB. I have indexes created on cust_id, customer_k, stock_customer_k, stock_soldate_k and date_k.

There are about 100,000 rows in customer, 3,000,000 rows in stock and 80,000 rows in date.

How can I make this query run faster? I would appreciate any help!

TABLE DEFINITIONS

date

 Column              |     Type      | Modifiers
---------------------+---------------+-----------
 date_k              | integer       | not null
 date_id             | character(16) | not null
 date_date           | date          |
 date_year           | integer       |

stock

Column                 |     Type     | Modifiers
-----------------------+--------------+-----------
 stock_soldate_k       | integer      |
 stock_soltime_k       | integer      |
 stock_customer_k      | integer      |
 stock_ds_price        | numeric(7,2) |
 stock_es_price        | numeric(7,2) |
 stock_ls_price        | numeric(7,2) |
 stock_ws_price        | numeric(7,2) |

customer:

Column                     |         Type          | Modifiers
---------------------------+-----------------------+-----------
 customer_k                | integer               | not null
 customer_id               | character(16)         | not null
 cust_first_name           | character(20)         |
 cust_last_name            | character(30)         |
 cust_prf                  | character(1)          |
 cust_birth_country        | character varying(20) |
 cust_login                | character(13)         |
 cust_email_address        | character(50)         |

TABLE "stock" CONSTRAINT "st1" FOREIGN KEY (stock_soldate_k) REFERENCES date(date_k)

"st2" FOREIGN KEY (stock_customer_k) REFERENCES customer(customer_k)

Renzo · Accepted Answer

Try this:

with stock_grouped as
     (select stock_customer_k, date_year, sum(((stock_ls_price-stock_ws_price-stock_ds_price)+stock_es_price)/2) total_yr
      from stock, date
      where stock_soldate_k = date_k
      group by stock_customer_k, date_year)
select cust_id customer_id,
       cust_first_name customer_first_name,
       cust_last_name customer_last_name,
       cust_prf customer_prf,
       cust_birth_country customer_birth_country,
       cust_login customer_login,
       cust_email_address customer_email_address,
       date_year ddyear,
       total_yr,
       's' stock_type
from customer, stock_grouped
where customer_k = stock_customer_k

This query anticipates the grouping over the join.

How can I make this query run faster in postgres

Answers (2)

Query

Data model

Related Questions