Reputation: 103
I have a pricing history table with half a billion records. It is formatted like this:
Id, sku, vendor, price, datetime
What I want to do is get average price of all products by vendor for a certain date range. Most products are updated once every 3 days, but it varies.
So, this is the query I want to run:
SELECT
avg(price)
FROM table
WHERE
vendor='acme'
AND datetime > '12-15-2014'
AND datetime < '12-18-2014'
GROUP BY sku
This 3 day range is broad enough that i will for sure get at least one price sample, but some skus may have been sampled more than once, hence group by to try and get only one instance of each sku.
The problem is, this query runs and runs and doesn't seem to finish (more than 15 minutes). There are around 500k unique skus.
Any ideas?
edit: corrected asin to sku
Upvotes: 0
Views: 50
Reputation: 254916
For this query to be optimized by mysql you need to create a composite index
(vendor, datetime, asin)
IN THIS PARTICULAR ORDER (it mattters)
It also worth trying creating another one
(vendor, datetime, asin, price)
since it may perform better (since it's a so called "covering index").
The indexes with other order, like (datetime, vendor)
(which is suggested in another answer) are useless since the datetime
is used in a range comparison.
Few notes:
vendor='acme' AND datetime > '12-15-2014' AND datetime < '12-18-2014'
filter condition covers a small part of the whole table (say less than 10%)dd-mm-yyyy
literals (at least it's not documented, see references) so I assume it must be yyyy-mm-dd
insteaddatetime >= '2014-12-15'
instead.References:
Upvotes: 1
Reputation: 5569
You need an index to support your query. Suggest you create an index on vendor and datetime like so:
CREATE INDEX pricing_history_date_vendor ON pricing_history (datetime, vendor);
Also, I assume you wanted to group by sku
rather than undefined column asin
.
Not to mention your non-standard SQL date format MM-dd-yyyy
as pointed out by others in comments (should be yyyy-MM-dd
).
Upvotes: 0