Reputation: 11862
I'm stress testing an app by adding loads and loads of items and forcing it to do lots of work.
select *, (
select price
from prices
WHERE widget_id = widget.id
ORDER BY id DESC
LIMIT 1
) as maxprice
FROM widgets
ORDER BY created_at DESC
LIMIT 20 OFFSET 0
The query is timing out on the test environment which is using the basic Heroku shared database. (193mb in use of the 5gig max.)
What will solve that time out issue? The prices update each hour, so every hour you get 8500x new rows.
It's hugely excessive amounts for the app (in reality it's unlikely it would ever have 8500 widgets) but I'm wondering what's appropriate to solve this?
Is my query stupid? (i.e. is it a bad style of query to do that subselect - my SQL knowledge is terrible, one of the goals of this project is to improve it!)
Or am I just hitting a limit of a shared db and should expect to move onto a dedicated db (e.g. the min $200 per month dedicated postgres instance from Heroku.) given the size of the prices table? Is there a deeper issue in terms of how I've designed the DB? (i.e. it's a one to many, one widget has many prices.) Is there a more sensible approach?
I'm totally new to the world of sql and queries etc. at scale, hence the utter ignorance expressed above. :)
Upvotes: 0
Views: 710
Reputation: 656952
@Dave wants the latest price
per widget. You could do that in sub-queries and LIMIT 1
per widget, but in modern PostgreSQL, a window function does the job more elegantly. Consider first_value()
/ last_value()
:
SELECT w.*
, first_value(p.price) OVER (PARTITION BY w.id
ORDER BY created_at DESC) AS latest_price
FROM (
SELECT *
FROM widgets
ORDER BY created_at DESC
LIMIT 20
) w
JOIN prices p ON p.widget_id = w.id
GROUP BY w.col1, w.col2 -- spell out all columns of w.*
SELECT w.*
, max(p.price) AS max_price
FROM (
SELECT *
FROM widgets
ORDER BY created_at DESC
LIMIT 20
) w
JOIN prices p ON p.widget_id = w.id
GROUP BY w.col1, w.col2 -- spell out all columns of w.*
Fix table aliases.
Retrieve all columns of widgets
like the question demonstrates
In PostgreSQL 8.3 you must spell out all non-aggregated columns of the SELECT
list in the GROUP BY
clause. In PostgreSQL 9.1 or later, the primary key column would cover the whole table. I quote the manual here:
Allow non-GROUP BY columns in the query target list when the primary key is specified in the GROUP BY clause
I advice to never use mixed case identifiers like maxWidgetPrice
. Unquoted identifiers are folded to lower case by default in PostgreSQL. Do yourself a favor and use lower case identifiers exclusively.
Always use explicit JOIN conditions where possible. It's the canonical SQL way and it's more readable.
OFFSET 0
is just noise
However, the key to performance are the right indexes. I would go two indexes like these:
CREATE INDEX widgets_created_at_idx ON widgets (created_at DESC);
CREATE INDEX prices_widget_id_idx ON prices(widget_id, price DESC);
The second one is a multicolumn index, that should provide best performance for retrieving the maximum prize after you have determined the top 20 widgets using the first index. Not sure if PostgreSQL 8.3 (default on Heroku shared db) is already smart enough to make the most of it. PostgreSQL 9.1 certainly is.
For the latest price (see comments), use this index instead:
CREATE INDEX prices_widget_id_idx ON prices(widget_id, created_at DESC);
You don't have to (and shouldn't) just trust me. Test performance and query plans with EXPLAIN ANALYZE with and without indexes and see for yourself. Index creation should be very fast, even for a million rows.
If you consider to switch to a standalone PostgreSQL database on Heroku, you may be interested in this recent Heroku blog post:
Upvotes: 1
Reputation: 9150
I'm not quite clear on what you are asking, but here is my understanding:
Find the widgets you want to price. In this case it looks like you are looking for the most recent 20 widgets:
SELECT w.id
FROM widgets
ORDER BY created_at DESC
LIMIT 20 OFFSET 0
For each of the 20 widgets you found, it seems you want to find the highest associated price from the widget table:
SELECT s.id, MAX(p.price) AS maxWidgetPrice
FROM (SELECT w.id
FROM widgets
ORDER BY created_at DESC
LIMIT 20 OFFSET 0
) s -- widget subset
, prices p
WHERE s.id = p.widget_id
GROUP BY s.id
prices.widget_id needs to be indexed for this to be effective. You don't want to process the entire prices table each time if it is relatively large, just the subset of rows you need. EDIT: added "group by" (and no, this was not tested)
Upvotes: 0