Alex
Alex

Reputation: 1188

Very slow performance on thousands of millions records in MYSQL query

id    market_id      date          keyword                 sku            a             b         c  
1        1        2019-01-01    some text for this      QAB-XU-VV        3.1           2.4       3.5
2        2        2019-01-02    some text for text      ABC-XA-VV        2.1           4.1       1.2

This is an instance of table A

`id` : `int primary key auto increment`
`market_id` : `int(4)`  `INDEX`
`date` : `date`   `INDEX`
`keyword`: `varchar(191)` `utf8_general_ci`
`sku`: `varchar(191)` INDEX `utf8_general_ci`
`a, b, c` : `float` 

I need to query like this

SELECT
  sku,
  keyword,
  market_id,
  SUM(a),
  SUM(b),
  SUM(c),
FROM A
WHERE market_id = 2 AND date BETWEEN '2020-01-01' and '2020-02-02'
GROUP BY sku, keyword;

This query is expected about a hundred of thousand records. This query takes over 5 mins. so I used LIMIT and OFFSET but no improvements.

Please help me. Thank you.

UPDATED:

This shows EXPLAIN result. enter image description here

Upvotes: 1

Views: 1621

Answers (2)

Rick James
Rick James

Reputation: 142298

Gordon explained most things.

Note that GROUP BY and/or ORDER BY usually cause a break in the flow, thereby preventing most of the benefit of LIMIT. Well, actually there is a small benefit in LIMIT, but it is only the transmission effort and time for delivering a different-sized resultset.

As for improving further? As things stand with the current query and Gordon's "covering" index, no.

But... If this is a time-series of data and you could compute subtotals for the SUMs each day. Then fetching the grand total for 33 days can be made much faster; perhaps 10x faster.

Tentatively the summary table would have

PRIMARY KEY(market_id, date, sku, keyword)

And other columns of sum_a, sum_b, sum_c, and you would add new rows to the table each night.

I discuss "Summary tables" in http://mysql.rjweb.org/doc.php/summarytables

For indexing tips, see http://mysql.rjweb.org/doc.php/index_cookbook_mysql

Upvotes: 1

Gordon Linoff
Gordon Linoff

Reputation: 1269703

For this query:

SELECT sku, keyword, market_id,
       SUM(a), SUM(b), SUM(c)
FROM A
WHERE market_id = 2 AND
      date BETWEEN '2020-01-01' and '2020-02-02'
GROUP BY sku, keyword, market_id;

I would recommend a composite index on A(market_id, date, sku, keyword, a, b, c).

This is a covering index so the original data pages do not need to be read. You can also use A(market_id, date).

Upvotes: 4

Related Questions