Ian
Ian

Reputation: 519

Rewriting mysql select to reduce time and writing tmp to disk

I have a mysql query that's taking several minutes which isn't very good as it's used to create a web page.

Three tables are used: poster_data contains information on individual posters. poster_categories lists all the categories (movies, art, etc) while poster_prodcat lists the posterid number and the categories it can be in e.g. one poster would have multiple lines for say, movies, indiana jones, harrison ford, adventure films, etc.

this is the slow query:

select * 
  from poster_prodcat, 
       poster_data, 
       poster_categories 
 where poster_data.apnumber = poster_prodcat.apnumber 
   and poster_categories.apcatnum = poster_prodcat.apcatnum 
   and poster_prodcat.apcatnum='623'  
ORDER BY aptitle ASC 
   LIMIT 0, 32

According to the explain:

explain

It was taking a few minutes. Poster_data has just over 800,000 rows, while poster_prodcat has just over 17 million. Other category queries with this select are barely noticeable, while poster_prodcat.apcatnum='623' has about 400,000 results and is writing out to disk

Upvotes: 0

Views: 9839

Answers (4)

Jon Black
Jon Black

Reputation: 16559

hope you find this helpful - http://pastie.org/1105206

drop table if exists poster;
create table poster
(
poster_id int unsigned not null auto_increment primary key,
name varchar(255) not null unique
)
engine = innodb; 


drop table if exists category;
create table category
(
cat_id mediumint unsigned not null auto_increment primary key,
name varchar(255) not null unique
)
engine = innodb; 

drop table if exists poster_category;
create table poster_category
(
cat_id mediumint unsigned not null,
poster_id int unsigned not null,
primary key (cat_id, poster_id) -- note the clustered composite index !!
)
engine = innodb;

-- FYI http://dev.mysql.com/doc/refman/5.0/en/innodb-index-types.html

select count(*) from category
count(*)
========
500,000


select count(*) from poster
count(*)
========
1,000,000

select count(*) from poster_category
count(*)
========
125,675,688

select count(*) from poster_category where cat_id = 623
count(*)
========
342,820

explain
select
 p.*,
 c.*
from
 poster_category pc
inner join category c on pc.cat_id = c.cat_id
inner join poster p on pc.poster_id = p.poster_id
where
 pc.cat_id = 623
order by
 p.name
limit 32;

id  select_type table   type    possible_keys   key     key_len ref                         rows
==  =========== =====   ====    =============   ===     ======= ===                         ====
1   SIMPLE      c       const   PRIMARY         PRIMARY 3       const                       1   
1   SIMPLE      p       index   PRIMARY         name    257     null                        32  
1   SIMPLE      pc      eq_ref  PRIMARY         PRIMARY 7       const,foo_db.p.poster_id    1   

select
 p.*,
 c.*
from
 poster_category pc
inner join category c on pc.cat_id = c.cat_id
inner join poster p on pc.poster_id = p.poster_id
where
 pc.cat_id = 623
order by
 p.name
limit 32;

Statement:21/08/2010 
0:00:00.021: Query OK

Upvotes: 4

Adam Butler
Adam Butler

Reputation: 3037

Looks like the query is using the apptitle index to get the ordering but it is doing a full scan to filter the results. I think it might help if you have a composite index across both apptitle and apnumber on poster_data. MySQL might then be able to use this to do both the sort order and the filter.

create index data_title_anum_idx on poster_data(aptitle,apnumber);

Upvotes: 0

user21884
user21884

Reputation: 106

You need to have an index on apnumber in POSTER_DATA. Scanning 841,152 records is killing the performance.

Upvotes: 0

Kissaki
Kissaki

Reputation: 9237

The query you listed is how the final query will look like? (So they have the apcatnum=/ID/ ?)

where poster_data.apnumber=poster_prodcat.apnumber and poster_categories.apcatnum=poster_prodcat.apcatnum and poster_prodcat.apcatnum='623'

poster_prodcat.apcatnum='623' will vastly decrease the data-set mysql has to work on, thus this should be the first parsed part of the query.

Then go on to swap the where-comparisons so those minimizing the data-set the most will be parsed first.

You may also want to try sub-queries. I’m not sure that will help, but mysql probably won’t first get all 3 tables, but first do the sub-query and then the other one. This should minimize memory consumption while querying. Although this is not an option if you really want to select all columns (as you’re using a * there).

Upvotes: 0

Related Questions