Reputation: 1583
I'm running the following query on a Macbook Pro 2.53ghz with 4GB of Ram:
SELECT
c.id AS id,
c.name AS name,
c.parent_id AS parent_id,
s.domain AS domain_name,
s.domain_id AS domain_id,
NULL AS stats
FROM
stats s
LEFT JOIN stats_id_category sic ON s.id = sic.stats_id
LEFT JOIN categories c ON c.id = sic.category_id
GROUP BY
c.name
It takes about 17 seconds to complete.
EXPLAIN:
alt text http://img7.imageshack.us/img7/1364/picture1va.png
The tables:
Information:
Number of rows: 147397
Data size: 20.3MB
Index size: 1.4MB
Table:
CREATE TABLE `stats` (
`id` int(11) unsigned NOT NULL auto_increment,
`time` int(11) NOT NULL,
`domain` varchar(40) NOT NULL,
`ip` varchar(20) NOT NULL,
`user_agent` varchar(255) NOT NULL,
`domain_id` int(11) NOT NULL,
`date` timestamp NOT NULL default CURRENT_TIMESTAMP,
`referrer` varchar(400) default NULL,
KEY `id` (`id`)
) ENGINE=MyISAM AUTO_INCREMENT=147398 DEFAULT CHARSET=utf8
Information second table:
Number of rows: 1285093
Data size: 11MB
Index size: 17.5MB
Second table:
CREATE TABLE `stats_id_category` (
`stats_id` int(11) NOT NULL,
`category_id` int(11) NOT NULL,
KEY `stats_id` (`stats_id`,`category_id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8
Information third table:
Number of rows: 161
Data size: 3.9KB
Index size: 8KB
Third table:
CREATE TABLE `categories` (
`id` int(11) NOT NULL auto_increment,
`parent_id` int(11) default NULL,
`name` varchar(40) NOT NULL,
`questions_category_id` int(11) NOT NULL default '0',
`rank` int(2) NOT NULL default '0',
PRIMARY KEY (`id`),
KEY `id` (`id`)
) ENGINE=MyISAM AUTO_INCREMENT=205 DEFAULT CHARSET=latin1
Hopefully someone can help me speed this up.
Upvotes: 1
Views: 1262
Reputation: 562871
I see several WTF's in your query:
You use two LEFT OUTER JOIN
s but then you group by the c.name
column which might have no matches. So perhaps you don't really need an outer join? If that's the case, you should use an inner join, because outer joins are often slower.
You are grouping by c.name
but this gives ambiguous results for every other column in your select-list. I.e. there might be multiple values in these columns in each grouping by c.name
. You're lucky you're using MySQL, because this query would simply give an error in any other RDBMS.
This is a performance issue because the GROUP BY
is likely causing the "using temporary; using filesort
" you see in the EXPLAIN. This is a notorious performance-killer, and it's probably the single biggest reason this query is taking 17 seconds. Since it's not clear why you're using GROUP BY
at all (using no aggregate functions, and violating the Single-Value Rule), it seems like you need to rethink this.
You are grouping by c.name
which doesn't have a UNIQUE
constraint on it. You could in theory have multiple categories with the same name, and these would be lumped together in a group. I wonder why you don't group by c.id
if you want one group per category.
SELECT NULL AS stats
: I don't understand why you need this. It's kind of like creating a variable that you never use. It shouldn't harm performance, but it's just another WTF that makes me think you haven't thought this query through very well.
You say in a comment you're looking for number of visitors per category. But your query doesn't have any aggregate functions like SUM()
or COUNT()
. And your select-list includes s.domain
and s.domain_id
which would be different for every visitor, right? So what value do you expect to be in the result set if you only have one row per category? This isn't really a performance issue either, it just means your query results don't tell you anything useful.
Your stats_id_category
table has an index over its two columns, but no primary key. So you can easily get duplicate rows, and this means your count of visitors may be inaccurate. You need to drop that redundant index and use a primary key instead. I'd order category_id
first in that primary key, so the join can take advantage of the index.
ALTER TABLE stats_id_category DROP KEY stats_id,
ADD PRIMARY KEY (category_id, stats_id);
Now you can eliminate one of your joins, if all you need to count is the number of visitors:
SELECT c.id, c.name, c.parent_id, COUNT(*) AS num_visitors
FROM categories c
INNER JOIN stats_id_category sic ON (sic.category_id = c.id)
GROUP BY c.id;
Now the query doesn't need to read the stats
table at all, or even the stats_id_category table
. It can get its count simply by reading the index of the stats_id_category
table, which should eliminate a lot of work.
Upvotes: 2
Reputation: 66152
I agree with Bill. Point 2 is very important. The query doesn't even make logical sense. Also, with the simple fact that there is no where statement means that you have to pull back every row in the stats table, which seems to be around 140000. It then has to sort all that data, so that it can perform the GROUP BY. This is because sort [ O(n log n)] and then find duplicates [ O(n) ] is much faster than just finding duplicates without sorting the data set [ O(n^2)?? ].
Upvotes: 0
Reputation: 12449
Harrison is right; we need the other table. I would start by adding an index on category_id to stats_id_category, though.
Upvotes: 0
Reputation: 7144
You are missing the third table in the information provided (categories).
Also, it seems odd that you are doing a LEFT JOIN and then using the right table (which might be all NULLS) in the GROUP BY. You will end up grouping all of the non-matching rows together as a result, is that what you intended?
Finally, can you provide an EXPLAIN for the SELECT?
Upvotes: 0