Reputation: 235
I have a table I created locally to use some of PG's window functions on a dataset that has about 4 million rows (originally a text file). Each row corresponds to a customer order.
CREATE TABLE orders
(
orderid integer,
customerid integer,
orderdate date,
status text,
amount money,
tax money,
customername text,
customerstate text
I have the database running locally in a i7 8gb RAM Windows 8 machine. I have btree indexes (indices?) on orderid, customerid and orderdate.
When I run the following query, it takes 300 seconds (appx). I was hoping with some basic tuning I could get it down to a minute, but I'm not a DBA. Anyone have tips?
select orderid, customername, orderdate,
rank() OVER (PARTITION BY customername ORDER BY orderdate ASC) as cust_ord_nbr
from orders
Upvotes: 1
Views: 173
Reputation: 658222
Partition by customerid
, like @Daniel commented. integer
is smaller and cheaper to sort. If you don't need customername
in the result, replace it with customerid
completely.
A multicolumn index can help (like @wildplasser commented). If it's a (mostly) read-only table, a "covering" index that allows index-only scans would be even faster - especially if included columns are small:
CREATE INDEX orders_nbr_idx ON orders (customerid, orderdate, orderid);
Adding orderid
to the index only makes sense if you get index-only scans out of it. If you need the customername
, add that, too. More:
If it's (mostly) a read-only table, execute the expensive query once and save the snapshot as MATERIALIZED VIEW
for reuse ...
You can do a couple of small things to reduce the memory footprint. After playing column tetris, this would save 0-7 bytes per row currently lost to padding:
CREATE TABLE orders (
orderid integer,
customerid integer,
amount money,
tax money,
orderdate date,
status text,
customername text,
customerstate text
);
If you write the result to another table (or MATERIALIZED VIEW
), it would save a bit to optimize the query in a similar fashion. rank()
produces a bigint
, by casting to int
you save 8 bytes per row (4 + 4 padding):
SELECT orderid, customername, orderdate
-- orderid, customerid, orderdate -- good enough?
, rank() OVER (PARTITION BY customerid
ORDER BY orderdate)::int AS cust_ord_nbr
FROM orders;
Upvotes: 2