Reputation: 76057
I have some tables that are more or less like this (I'll use a simpler domain so the explanation is clearer):
Trades
----------
ID
Seller_ID
Trade_Date
Sellers
-------
ID
Department_ID
And I want to get the latest trade made by each department. The query is something like:
SELECT Department_ID, MAX(Trade_Date) FROM
Trades, Sellers
WHERE Trades.Seller_ID = Sellers.ID
GROUP BY Sellers.Department_ID
The table Trades has an index on the date, so it can be used to speed up the queries, but I noticed that the query runs fast for some departments (hardcoding the id) and very slow for others.
I've deduced that it happens because of the huge difference of volume of trades by each department. The database is doing a sequential scan on the sorted index to get the first occurrence, and those departments that made their latest sell long time ago will need to go very far in the index.
My current solution is storing the latest query results in an auxiliary table, and make the new queries incremental (filtering by the latest date that was already in the auxiliary table). It solves the problem, as the query is run very frequently and now the index scan just needs to take into account a few seconds worth of trades.
But I think there should be a more elegant solution to this. I know that if the aggregation was made by Seller rather than Department, a compound index would definitively help, but I don't think it is allowed to build indices that spawn different tables...
Upvotes: 0
Views: 198
Reputation: 1714
Have you considered using a materialized view or if you use postgres building something like that on your own? If the insert operation are not as time critical as the selects then I would consider going this way.
There is an article about MVs in postgres:
http://tech.jonathangardner.net/wiki/PostgreSQL/Materialized_Views
Upvotes: 1