Reputation: 272106
I've got a table structure that can be summarized as follows:
pagegroup
* pagegroupid
* name
has 3600 rows
page
* pageid
* pagegroupid
* data
references pagegroup; has 10000 rows; can have anything between 1-700 rows per pagegroup; the data column is of type mediumtext and the column contains 100k - 200kbytes data per row
userdata
* userdataid
* pageid
* column1
* column2
* column9
references page; has about 300,000 rows; can have about 1-50 rows per page
The above structure is pretty straight forwad, the problem is that that a join from userdata to page group is terribly, terribly slow even though I have indexed all columns that should be indexed. The time needed to run a query for such a join (userdata inner_join page inner_join pagegroup) exceeds 3 minutes. This is terribly slow considering the fact that I am not selecting the data column at all. Example of the query that takes too long:
SELECT userdata.column1, pagegroup.name
FROM userdata
INNER JOIN page USING( pageid )
INNER JOIN pagegroup USING( pagegroupid )
Please help by explaining why does it take so long and what can i do to make it faster.
Explain returns following gibberish:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE userdata ALL pageid 372420
1 SIMPLE page eq_ref PRIMARY,pagegroupid PRIMARY 4 topsecret.userdata.pageid 1
1 SIMPLE pagegroup eq_ref PRIMARY PRIMARY 4 topsecret.page.pagegroupid 1
SELECT
u.field2, p.pageid
FROM
userdata u
INNER JOIN page p ON u.pageid = p.pageid;
/*
0.07 sec execution, 6.05 sec fecth
*/
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE u ALL pageid 372420
1 SIMPLE p eq_ref PRIMARY PRIMARY 4 topsecret.u.pageid 1 Using index
SELECT
p.pageid, g.pagegroupid
FROM
page p
INNER JOIN pagegroup g ON p.pagegroupid = g.pagegroupid;
/*
9.37 sec execution, 60.0 sec fetch
*/
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE g index PRIMARY PRIMARY 4 3646 Using index
1 SIMPLE p ref pagegroupid pagegroupid 5 topsecret.g.pagegroupid 3 Using where
Keep medium/long text columns in a separate table if you run into performance problems such as this one.
Upvotes: 3
Views: 6252
Reputation: 238096
I'm assuming the userdata table is very large and does not fit in memory. MySQL would have to read the entire table from harddisk, even if it needs only two small columns.
You can try to eliminate the need for scanning the entire table by defining an index that contains everything the query needs. That way, the index is not a way to facilitate a search into the main table, but it's a shorthand version of the table itself. MySQL only has to read the shorthand table from disk.
The index could look like this:
column1, pageid
This has to be non-clustered, or it would be part of the big table, defeating its purpose. See this page for an idea on how MySQL decides which index to cluster. The easiest way seems to make sure you have a primary key on pageid, which will be clustered, so the secondary column1+pageid index will be non-clustered.
Upvotes: 2
Reputation: 45721
What's the data type and purpose of columnX in the userdata table? It should be noted that any text data type (i.e excluding char, varchar) forces any temporary tables to be created on disk. Now since you're doing a straight join without conditions, grouping or ordering, it probably won't need any temporary tables, except for aggregating the final result.
I think it would also be very helpful if you show us how your indexes are created. One thing to remember is that while InnoDB concatenates the primary key of the table to each index, MyISAM does not. This means that if you index column name and search for it with LIKE, but still want to get the id of the page group; Then the query would still need to visit the table to get the id instead of being able to retrieve it from the index.
What this means, in your case, if I understand your comment to apphacker correctly, is to get the name of each users pagegroups. The query optimizer would want to use the index for the join, but for each result it would also need to visit the table to retrieve the page group name. If your datatype on name is not bigger than a moderate varchar, i.e. no text, you could also create an index (id, name) which would enable the query to fetch the name directly from the index.
As a final try, you point out that the whole query would probably be faster if the mediumtext was not in the page table.
This would hopefully enable you to join quicker since no column in Pages take up much space. Then, when you needed to display a certain page, you join with the PageData table on the pageId-column to fetch the data needed to display a particular page.
Upvotes: 4
Reputation: 71830
Looks like you're doing a join on all rows on userdata
and then trying to select everything. That is every page
in a pagegroup
with userdata
. Where's the WHERE
clause? There's no LIMIT
, how many results did you want? Why don't you get your row count down on userdata
row in your explain
result, that should speed up the query. Heh.
Upvotes: 1
Reputation: 23216
The easy way to figure out what MySQL is doing with your query is to have it explain the query to you. Run this and have a look at the output:
EXPLAIN SELECT userdata.column1, pagegroup.name
FROM userdata
INNER JOIN page USING( pageid )
INNER JOIN pagegroup USING( pagegroupid )
MySQL will tell you in which order it processes the queries and what indexes it uses. The fact that you created indexes does not mean that MySQL actually uses them.
See also Optimizing queries with EXPLAIN
EDIT
The output of your EXPLAIN looks fine. It does a full table scan on the userdata table, but that is normal since you want to return all rows in it. The best way to optimize this is to rethink your application. Do you really need to return all 372K rows?
Upvotes: 2
Reputation: 338178
I would start with breaking the query up, to figure out if there is one slow and one fast part, or if both are slow (sorry, I'm no fan of the USING syntax, so I'm going to use ON):
SELECT
u.userdata, p.pageid
FROM
userdata u
INNER JOIN page p ON u.pageid = p.pageid
SELECT
p.pageid, g.pagegroupid
FROM
page
INNER JOIN pagegroup g ON p.pagegroupid = g.pagegroupid
What does that give you? Running these with EXPLAIN EXTENDED
will provide additional hints.
Upvotes: 1
Reputation: 881605
One possible issue is that MySQL uses only one index per query and maybe you don't have a single index with those columns -- or MySQL's query optimizer isn't picking it. What does EXPLAIN SELECT
&c tell you here?
Upvotes: 1