rg88
rg88

Reputation: 20977

Optimizing a PHP page: MySQL bottleneck

I have a page that is taking 37 seconds to load. While it is loading it pegs MySQL's CPU usage through the roof. I did not write the code for this page and it is rather convoluted so the reason for the bottleneck is not readily apparent to me.

I profiled it (using kcachegrind) and find that the bulk of the time on the page is spent doing MySQL queries (90% of the time is spent in 25 different mysql_query calls).

The queries take the form of the following with the tag_id changing on each of the 25 different calls:

SELECT * FROM tbl_news WHERE news_id
 IN (select news_id from
 tbl_tag_relations WHERE tag_id = 20)

Each query is taking around 0.8 seconds to complete with a few longer delays thrown in for good measure... thus the 37 seconds to completely load the page.

My question is, is it the way the query is formatted with that nested select that is causing the problem? Or could it be any one of a million other things? Any advice on how to approach tackling this slowness is appreciated.

Running EXPLAIN on the query gives me this (but I'm not clear on the impact of these results... the NULL on primary key looks like it would be bad, yes? The number of results returned seems high to me as well as only a handful of results are returned in the end):

1    PRIMARY     tbl_news   ALL NULL    NULL    NULL    NULL    1318    Using where
2   DEPENDENT SUBQUERY  tbl_tag_relations   ref FK_tbl_tag_tags_1   FK_tbl_tag_tags_1   4   const   179 Using where

Upvotes: 2

Views: 1731

Answers (5)

cletus
cletus

Reputation: 625007

I'e addressed this point in Database Development Mistakes Made by AppDevelopers. Basically, favour joins to aggregation. IN isn't aggregation as such but the same principle applies. A good optimize will make these two queries equivalent in performance:

SELECT * FROM tbl_news WHERE news_id
 IN (select news_id from
 tbl_tag_relations WHERE tag_id = 20)

and

SELECT tn.*
FROM tbl_news tn
JOIN tbl_tag_relations ttr ON ttr.news_id = tn.news_id
WHERE ttr.tag_id = 20

as I believe Oracle and SQL Server both do but MySQL doesn't. The second version is basically instantaneous. With hundreds of thousands of rows I did a test on my machine and got the first version to sub-second performance by adding appropriate indexes. The join version with indexes is basically instantaneous but even without indexes performs OK.

By the way, the above syntax I use is the one you should prefer for doing joins. It's clearer than putting them in the WHERE clause (as others have suggested) and the above can do certain things in an ANSI SQL way with left outer joins that WHERE conditions can't.

So I would add indexes on the following:

  • tbl_news (news_id)
  • tbl_tag_relations (news_id)
  • tbl_tag_relations (tag_id)

and the query will execute almost instantaneously.

Lastly, don't use * to select all the columns you want. Name them explicitly. You'll get into less trouble as you add columns later.

Upvotes: 5

Ryan Bair
Ryan Bair

Reputation: 2634

Unfortunately MySQL doesn't do very well with uncorrelated subqueries like your case shows. The plan is basically saying that for every row on the outer query, the inner query will be performed. This will get out of hand quickly. Rewriting as a plain old join as others have mentioned will work around the problem but may then cause the undesired affect of duplicate rows.

For instance the original query would return 1 row for each qualifying row in the tbl_news table but this query:

SELECT news_id, name, blah
FROM tbl_news n
JOIN tbl_tag_relations r ON r.news_id = n.news_id
WHERE r.tag_id IN (20,21,22)

would return 1 row for each matching tag. You could stick DISTINCT on there which should only have a minimal performance impact depending on the size of the dataset.

Not to troll too badly, but most other databases (PostgreSQL, Firebird, Microsoft, Oracle, DB2, etc) would handle the original query as an efficient semi-join. Personally I find the subquery syntax to be much more readable and easier to write, especially for larger queries.

Upvotes: 1

Joe Phillips
Joe Phillips

Reputation: 51100

If I understand correctly, this is just listing the news stories for a specific set of tags.

  1. First of all, you really shouldn't ever SELECT *

  2. Second, this can probably be
    accomplished within a single query, thus reducing the overhead cost of
    multiple queries. It seems like it is getting fairly trivial data so it could be retrieved within a single call instead of 20.

  3. A better approach to using IN might be to use a JOIN with a WHERE condition instead. When using an IN it will basically be a lot of OR statements.
  4. Your tbl_tag_relations should definitely have an index on tag_id

Upvotes: 2

Andy Moore
Andy Moore

Reputation: 865

The SQL Query itself is definitely your bottleneck. The query has a sub-query in it, which is the IN(...) portion of the code. This is essentially running two queries at once. You can likely halve (or more!) your SQL times with a JOIN (similar to what d03boy mentions above) or a more targeted SQL query. An example might be:

SELECT * 
FROM tbl_news, tbl_tag_relations 
WHERE tbl_tag_relations.tag_id = 20 AND
tbl_news.news_id = tbl_tag_relations.news_id 

To help SQL run faster you also want to try to avoid using SELECT *, and only select the information you need; also put a limiting statement at the end. eg:

SELECT news_title, news_body 
... 
LIMIT 5;

You also will want to look into the database schema itself. Make sure you are indexing all of the commonly referred to columns so that the queries will run faster. In this case, you probably want to check your news_id and tag_id fields.

Finally, you will want to take a look at the PHP code and see if you can make one single all-encompassing SQL query instead of iterating through several seperate queries. If you post more code we can help with that, and it will probably be the single greatest time savings for your posted problem. :)

Upvotes: 3

cgp
cgp

Reputation: 41381

select * 
 from tbl_news, tbl_tag_relations 
 where 
      tbl_tag_relations.tag_id = 20 and 
      tbl_news.news_id = tbl_tag_relations.news_id 
 limit 20

I think this gives the same results, but I'm not 100% sure. Sometimes simply limiting the results helps.

Upvotes: 1

Related Questions