Reputation: 529
I have a database with the following three tables:
matches table has 200,000 matches...
CREATE TABLE `matches` (
`match_id` bigint(20) unsigned NOT NULL,
`start_time` int(10) unsigned NOT NULL,
PRIMARY KEY (`match_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
heroes table has ~100 heroes...
CREATE TABLE `heroes` (
`hero_id` smallint(5) unsigned NOT NULL,
`name` char(40) NOT NULL,
PRIMARY KEY (`hero_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
matches_heroes table has 2,000,000 relationships (10 random heroes per match)...
CREATE TABLE `matches_heroes` (
`relation_id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`match_id` bigint(20) unsigned NOT NULL,
`hero_id` smallint(6) unsigned NOT NULL,
PRIMARY KEY (`relation_id`),
KEY `match_id` (`match_id`),
KEY `hero_id` (`hero_id`),
CONSTRAINT `matches_heroes_ibfk_2` FOREIGN KEY (`hero_id`)
REFERENCES `heroes` (`hero_id`),
CONSTRAINT `matches_heroes_ibfk_1` FOREIGN KEY (`match_id`)
REFERENCES `matches` (`match_id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=3689891 DEFAULT CHARSET=utf8
The following query takes over 1 second, which seems pretty slow to me for something so simple:
SELECT SQL_NO_CACHE COUNT(*) AS match_count
FROM matches INNER JOIN matches_heroes ON matches.match_id = matches_heroes.match_id
WHERE hero_id = 5
Removing only the WHERE clause doesn't help, but if I take out the INNER JOIN also, like so:
SELECT SQL_NO_CACHE COUNT(*) AS match_count FROM matches
...it only takes 0.05 seconds. It seems that INNER JOIN is very costly. I don't have much experience with joins. Is this normal or am I doing something wrong?
UPDATE #1: Here's the EXPLAIN result.
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE matches_heroes ref match_id,hero_id,match_id_hero_id hero_id 2 const 34742
1 SIMPLE matches eq_ref PRIMARY PRIMARY 8 mydatabase.matches_heroes.match_id 1 Using index
UPDATE #2: After listening to you guys, I think it's working properly and this is simply as fast as it gets. Please let me know if you disagree. Thanks for all the help. I really appreciate it.
Upvotes: 3
Views: 3385
Reputation: 1527
Use (not a problem on MySql InnerJoin, my bad).COUNT(matches.match_id)
instead of count(*)
, as when using joins it's best to not use the *
as it does extra computation. Using columns from the join are the best way ensure you are not requesting any other operations.
Also you should verify that you have all keys defragmented, and enough ram free for the index to load in memory
Try to add a composed index for match_id,hero_id
as it should give better performance.
ALTER TABLE `matches_heroes` ADD KEY `match_id_hero_id` (`match_id`,`hero_id`)
I wasn't satisfied with the accepted answer, that mysql is that slow for just 2 mill records and I runed benchmarks on my ubuntu PC (i7 processor, with standard HDD).
-- pre-requirements
CREATE TABLE seq_numbers (
number INT NOT NULL
) ENGINE = MYISAM;
DELIMITER $$
CREATE PROCEDURE InsertSeq(IN MinVal INT, IN MaxVal INT)
BEGIN
DECLARE i INT;
SET i = MinVal;
START TRANSACTION;
WHILE i <= MaxVal DO
INSERT INTO seq_numbers VALUES (i);
SET i = i + 1;
END WHILE;
COMMIT;
END$$
DELIMITER ;
CALL InsertSeq(1,200000)
;
ALTER TABLE seq_numbers ADD PRIMARY KEY (number)
;
-- create tables
-- DROP TABLE IF EXISTS `matches`
CREATE TABLE `matches` (
`match_id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`start_time` int(10) unsigned NOT NULL,
PRIMARY KEY (`match_id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8
;
CREATE TABLE `heroes` (
`hero_id` smallint(5) unsigned NOT NULL AUTO_INCREMENT,
`name` char(40) NOT NULL,
PRIMARY KEY (`hero_id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8
;
CREATE TABLE `matches_heroes` (
`match_id` bigint(20) unsigned NOT NULL,
`hero_id` smallint(6) unsigned NOT NULL,
PRIMARY KEY (`match_id`,`hero_id`),
KEY (match_id),
KEY (hero_id),
CONSTRAINT `matches_heroes_ibfk_2` FOREIGN KEY (`hero_id`) REFERENCES `heroes` (`hero_id`),
CONSTRAINT `matches_heroes_ibfk_1` FOREIGN KEY (`match_id`) REFERENCES `matches` (`match_id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=MyISAM DEFAULT CHARSET=utf8
;
-- insert DATA
-- 100
INSERT INTO heroes(name)
SELECT SUBSTR(CONCAT(char(RAND()*25+65),char(RAND()*25+97),char(RAND()*25+97),char(RAND()*25+97),char(RAND()*25+97),char(RAND()*25+97),char(RAND()*25+97),char(RAND()*25+97),char(RAND()*25+97),char(RAND()*25+97),char(RAND()*25+97),char(RAND()*25+97)),1,RAND()*9+4) as RandomName
FROM seq_numbers WHERE number <= 100
-- 200000
INSERT INTO matches(start_time)
SELECT rand()*1000000
FROM seq_numbers WHERE number <= 200000
-- 2000000
INSERT INTO matches_heroes(hero_id,match_id)
SELECT a.hero_id, b.match_id
FROM heroes as a
INNER JOIN matches as b ON 1=1
LIMIT 2000000
-- warm-up database, load INDEXes in ram (optional, works only for MyISAM tables)
LOAD INDEX INTO CACHE matches_heroes,matches,heroes
-- get random hero_id
SET @randHeroId=(SELECT hero_id FROM matches_heroes ORDER BY rand() LIMIT 1);
-- test 1
SELECT SQL_NO_CACHE @randHeroId,COUNT(*) AS match_count
FROM matches as a
INNER JOIN matches_heroes as b ON a.match_id = b.match_id
WHERE b.hero_id = @randHeroId
; -- Time: 0.039s
-- test 2: adding some complexity
SET @randName = (SELECT `name` FROM heroes WHERE hero_id = @randHeroId LIMIT 1);
SELECT SQL_NO_CACHE @randName, COUNT(*) AS match_count
FROM matches as a
INNER JOIN matches_heroes as b ON a.match_id = b.match_id
INNER JOIN heroes as c ON b.hero_id = c.hero_id
WHERE c.name = @randName
; -- Time: 0.037s
Conclusion: The test results are about 20x faster, and my server load was about 80% before testing as it's not a dedicated mysql server and had other cpu intensive tasks running, so if you run the whole script (from above) and get lower results it can be because:
Note that by using the test script, the generating of new data rules out the index fragmentation problem. Hope this helps, and ask if you have issues in testing this.
obs:
SELECT SQL_NO_CACHE COUNT(*) AS match_count
FROM matches INNER JOIN matches_heroes ON matches.match_id = matches_heroes.match_id
WHERE hero_id = 5`
is the equivalent to:
SELECT SQL_NO_CACHE COUNT(*) AS match_count
FROM matches_heroes
WHERE hero_id = 5`
So you wouldn't require a join, if that's the count you need, but I'm guessing that was just an example.
Upvotes: 1
Reputation: 94859
So you say reading a table of 200,000 records is faster than reading a table of 2,000,000 records, finding the desired ones, then take them all to find matching records in the 200,000 record table?
And this surprises you? It's simply a lot of more work for the dbms. (It can even be, btw, that the dbms decides not to use the hero_id index when it considers a full table scan to be faster.)
So in my opinion there is nothing wrong with what is happening here.
Upvotes: 1