Sean Peace
Sean Peace

Reputation: 11

Mysql Group By and Sum performance issues

We have just one table with millions of rows where this query, as it stands takes 138 seconds to run on a server with a buffer pool size of 25G, the server itself linux with SSD drives.

I am wondering if anyone could suggest any improvements in MySQL settings or in the query itself that would reduce run time. We only have about 8 large member_id's that have this performance problem, the rest run under 5 seconds. We run multiple summary tables like this for rollup reporting.

select * 
from (
 SELECT distinct account_name AS source,SUM(royalty_amount) AS total_amount
 FROM royalty_stream
 WHERE member_id = '1050705'
    AND deleted = 0  
    AND period_year_quarter >= '2016_Q1'
    AND period_year_quarter <= '2016_Q2'    
 GROUP BY account_name
 ORDER BY total_amount desc 
 LIMIT 1
 ) a 

Upvotes: 1

Views: 80

Answers (1)

Jacobm001
Jacobm001

Reputation: 4539

I see a few obvious improvements.

Subselects

Don't use a subselect. This isn't a huge deal, but it makes little sense to add the overhead here.

Using Distinct

Is the distinct really needed here? Since you're grouping, it should be unnecessary overhead.

Data Storage Practices

Your period_year_quarter evaluation is going to be a hurdle. String comparisons are one of the slower things you can do, unfortunately. If you have the ability to update the data structure, I would highly recommend that you break period_year_quarter into two distinct, integer fields. One for the year, one for the quarter.

Is royalty_amount actually stored as a number, or are you making the database implicitly convert it every time? If so (surprisingly common mistake) converting that to a number will also help.

Indexing

You haven't explained what indexes are on this table. I'm hoping that you at least have one on member_id. If not, it should certainly be indexed.

I would further recommend an index on (member_id, period_year_quarter). If you took my advice from the previous section, that should be (member_id, year, quarter).


select 
  account_name as source
  , sum(royalty_amount) as total_amount
from 
  royalty_stream
where 
  member_id                = '1050705'
  and deleted              = 0
  and period_year_quarter between '2016_Q1' and '2016_Q2'
group by 
  account_name
order by 
  total_amount desc 
limit 1

Upvotes: 2

Related Questions