Reputation: 137
I have this query which takes 86 sec to execute.
select cust_id customer_id,
cust_first_name customer_first_name,
cust_last_name customer_last_name,
cust_prf customer_prf,
cust_birth_country customer_birth_country,
cust_login customer_login,
cust_email_address customer_email_address,
date_year ddyear,
sum(((stock_ls_price-stock_ws_price-stock_ds_price)+stock_es_price)/2) total_yr,
's' stock_type
from customer, stock, date
where customer_k = stock_customer_k
and stock_soldate_k = date_k
group by cust_id, cust_first_name, cust_last_name, cust_prf, cust_birth_country, cust_login, cust_email_address, date_year;
EXPLAIN ANALYZE RESULT:
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
GroupAggregate (cost=639753.55..764040.06 rows=2616558 width=213) (actual time=81192.575..86536.398 rows=190581 loops=1)
Group Key: customer.cust_id, customer.cust_first_name, customer.cust_last_name, customer.cust_prf, customer.cust_birth_country, customer.cust_login, customer.cust_email_address, date.date_year
-> Sort (cost=639753.55..646294.95 rows=2616558 width=213) (actual time=81192.468..83977.960 rows=2685453 loops=1)
Sort Key: customer.cust_id, customer.cust_first_name, customer.cust_last_name, customer.cust_prf, customer.cust_birth_country, customer.cust_login, customer.cust_email_address, date.date_year
Sort Method: external merge Disk: 460920kB
-> Hash Join (cost=6527.66..203691.58 rows=2616558 width=213) (actual time=60.500..2306.082 rows=2685453 loops=1)
Hash Cond: (stock.stock_customer_k = customer.customer_k)
-> Merge Join (cost=1423.66..144975.59 rows=2744641 width=30) (actual time=8.820..1412.109 rows=2750311 loops=1)
Merge Cond: (date.date_k = stock.stock_soldate_k)
-> Index Scan using date_key_idx on date (cost=0.29..2723.33 rows=73049 width=8) (actual time=0.013..7.164 rows=37622 loops=1)
-> Index Scan using stock_soldate_k_index on stock (cost=0.43..108829.12 rows=2880404 width=30) (actual time=0.004..735.043 rows=2750312 loops=1)
-> Hash (cost=3854.00..3854.00 rows=100000 width=191) (actual time=51.650..51.650rows=100000 loops=1)
Buckets: 16384 Batches: 1 Memory Usage: 16139kB
-> Seq Scan on customer (cost=0.00..3854.00 rows=100000 width=191) (actual time=0.004..30.341 rows=100000 loops=1)
Planning time: 1.761 ms
Execution time: 86621.807 ms
I have work_mem=512MB
. I have indexes created on
cust_id
, customer_k
, stock_customer_k
, stock_soldate_k
and date_k
.
There are about 100,000 rows in customer
, 3,000,000 rows in stock
and 80,000 rows in date
.
How can I make this query run faster? I would appreciate any help!
TABLE DEFINITIONS
date
Column | Type | Modifiers
---------------------+---------------+-----------
date_k | integer | not null
date_id | character(16) | not null
date_date | date |
date_year | integer |
stock
Column | Type | Modifiers
-----------------------+--------------+-----------
stock_soldate_k | integer |
stock_soltime_k | integer |
stock_customer_k | integer |
stock_ds_price | numeric(7,2) |
stock_es_price | numeric(7,2) |
stock_ls_price | numeric(7,2) |
stock_ws_price | numeric(7,2) |
customer:
Column | Type | Modifiers
---------------------------+-----------------------+-----------
customer_k | integer | not null
customer_id | character(16) | not null
cust_first_name | character(20) |
cust_last_name | character(30) |
cust_prf | character(1) |
cust_birth_country | character varying(20) |
cust_login | character(13) |
cust_email_address | character(50) |
TABLE "stock" CONSTRAINT "st1" FOREIGN KEY (stock_soldate_k) REFERENCES date(date_k)
"st2" FOREIGN KEY (stock_customer_k) REFERENCES customer(customer_k)
Upvotes: 0
Views: 1390
Reputation: 27414
Try this:
with stock_grouped as
(select stock_customer_k, date_year, sum(((stock_ls_price-stock_ws_price-stock_ds_price)+stock_es_price)/2) total_yr
from stock, date
where stock_soldate_k = date_k
group by stock_customer_k, date_year)
select cust_id customer_id,
cust_first_name customer_first_name,
cust_last_name customer_last_name,
cust_prf customer_prf,
cust_birth_country customer_birth_country,
cust_login customer_login,
cust_email_address customer_email_address,
date_year ddyear,
total_yr,
's' stock_type
from customer, stock_grouped
where customer_k = stock_customer_k
This query anticipates the grouping over the join.
Upvotes: 1
Reputation: 32159
A big performance penalty that you get is that about 450MB of intermediate data is stored externally: Sort Method: external merge Disk: 460920kB
. This happens because the planner first needs to satisfy the join conditions between the 3 tables, including the possibly inefficient table customer
, before the aggregation sum()
can take place, even while the aggregation can be perfectly well performed on table stock
alone.
Because your tables are fairly large, you are better off reducing the number of eligible rows as soon as possible and preferably before any joining. In this case that means doing the aggregation on table stock
in a sub-query and join that result to the other two tables:
SELECT c.cust_id AS customer_id,
c.cust_first_name AS customer_first_name,
c.cust_last_name AS customer_last_name,
c.cust_prf AS customer_prf,
c.cust_birth_country AS customer_birth_country,
c.cust_login AS customer_login,
c.cust_email_address AS customer_email_address,
d.date_year AS ddyear,
ss.total_yr,
's' stock_type
FROM (
SELECT
stock_customer_k AS ck,
stock_soldate_k AS sdk,
sum((stock_ls_price-stock_ws_price-stock_ds_price+stock_es_price)*0.5) AS total_yr
FROM stock
GROUP BY 1, 2) ss
JOIN customer c ON c.customer_k = ss.ck
JOIN date d ON d.date_k = ss.sdk;
The sub-query on stock
will result in far fewer rows, depending on the average number of orders per customer per date. Also, in the sum()
function, multiplying by 0.5 is far cheaper than dividing by 2 (although in the grand scheme of things it will be relatively marginal).
You should also look seriously at your data model.
In table customer
you use data types like char(30)
, which will always take up 30 bytes in your row, even when you store just 'X'. Using a varchar(30)
data type is much more efficient when many strings are shorter than the declared maximum width, because it takes up less space and thus requires fewer page reads (and writes on the intermediate data).
Table stock
uses numeric(7,2)
for prices. Use of the numeric
data type may give accurate results when subjecting data to many, many repeated operations, but they are also very slow. The double precision
data type will be much faster and equally accurate in your scenario. For presentation purposes you can round the value off to the desired precision.
As a suggestion, create a table stock_f
with double precision
data types instead of numeric
, copy all data over from stock
to stock_f
and run the query on that table.
Upvotes: 1