user1452536
user1452536

Reputation: 103

Large mysql query with group by

I have a pricing history table with half a billion records. It is formatted like this:

Id, sku, vendor, price, datetime

What I want to do is get average price of all products by vendor for a certain date range. Most products are updated once every 3 days, but it varies.

So, this is the query I want to run:

SELECT 
    avg(price) 
FROM table 
WHERE 
    vendor='acme' 
    AND datetime > '12-15-2014' 
    AND datetime < '12-18-2014' 
GROUP BY sku

This 3 day range is broad enough that i will for sure get at least one price sample, but some skus may have been sampled more than once, hence group by to try and get only one instance of each sku.

The problem is, this query runs and runs and doesn't seem to finish (more than 15 minutes). There are around 500k unique skus.

Any ideas?

edit: corrected asin to sku

Upvotes: 0

Views: 50

Answers (2)

zerkms
zerkms

Reputation: 254916

For this query to be optimized by mysql you need to create a composite index

(vendor, datetime, asin)

IN THIS PARTICULAR ORDER (it mattters)

It also worth trying creating another one

(vendor, datetime, asin, price)

since it may perform better (since it's a so called "covering index").

The indexes with other order, like (datetime, vendor) (which is suggested in another answer) are useless since the datetime is used in a range comparison.

Few notes:

  • The index will be helpful if only the vendor='acme' AND datetime > '12-15-2014' AND datetime < '12-18-2014' filter condition covers a small part of the whole table (say less than 10%)
  • Mysql does not support dd-mm-yyyy literals (at least it's not documented, see references) so I assume it must be yyyy-mm-dd instead
  • Your comparison does not cover the first second of the December 15th, 2014. So you probably wanted datetime >= '2014-12-15' instead.

References:

Upvotes: 1

gknicker
gknicker

Reputation: 5569

You need an index to support your query. Suggest you create an index on vendor and datetime like so:

CREATE INDEX pricing_history_date_vendor ON pricing_history (datetime, vendor);

Also, I assume you wanted to group by sku rather than undefined column asin.

Not to mention your non-standard SQL date format MM-dd-yyyy as pointed out by others in comments (should be yyyy-MM-dd).

Upvotes: 0

Related Questions