DaiBu
DaiBu

Reputation: 529

Why does adding an INNER JOIN make this query so slow?

I have a database with the following three tables:

matches table has 200,000 matches...

CREATE TABLE `matches` (
`match_id` bigint(20) unsigned NOT NULL,
`start_time` int(10) unsigned NOT NULL,
PRIMARY KEY (`match_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8

heroes table has ~100 heroes...

CREATE TABLE `heroes` (
`hero_id` smallint(5) unsigned NOT NULL,
`name` char(40) NOT NULL,
PRIMARY KEY (`hero_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8

matches_heroes table has 2,000,000 relationships (10 random heroes per match)...

CREATE TABLE `matches_heroes` (
`relation_id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`match_id` bigint(20) unsigned NOT NULL,
`hero_id` smallint(6) unsigned NOT NULL,
PRIMARY KEY (`relation_id`),
KEY `match_id` (`match_id`),
KEY `hero_id` (`hero_id`),
CONSTRAINT `matches_heroes_ibfk_2` FOREIGN KEY (`hero_id`)
REFERENCES `heroes` (`hero_id`),
CONSTRAINT `matches_heroes_ibfk_1` FOREIGN KEY (`match_id`)
REFERENCES `matches` (`match_id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=3689891 DEFAULT CHARSET=utf8

The following query takes over 1 second, which seems pretty slow to me for something so simple:

SELECT SQL_NO_CACHE COUNT(*) AS match_count
FROM matches INNER JOIN matches_heroes ON matches.match_id = matches_heroes.match_id
WHERE hero_id = 5

Removing only the WHERE clause doesn't help, but if I take out the INNER JOIN also, like so:

SELECT SQL_NO_CACHE COUNT(*) AS match_count FROM matches

...it only takes 0.05 seconds. It seems that INNER JOIN is very costly. I don't have much experience with joins. Is this normal or am I doing something wrong?

UPDATE #1: Here's the EXPLAIN result.

id  select_type  table          type   possible_keys                     key     key_len  ref                                rows  Extra  
1   SIMPLE       matches_heroes ref    match_id,hero_id,match_id_hero_id hero_id 2        const                              34742
1   SIMPLE       matches        eq_ref PRIMARY                           PRIMARY 8        mydatabase.matches_heroes.match_id 1     Using index

UPDATE #2: After listening to you guys, I think it's working properly and this is simply as fast as it gets. Please let me know if you disagree. Thanks for all the help. I really appreciate it.

Upvotes: 3

Views: 3385

Answers (2)

Stefan Rogin
Stefan Rogin

Reputation: 1527

Use COUNT(matches.match_id) instead of count(*), as when using joins it's best to not use the * as it does extra computation. Using columns from the join are the best way ensure you are not requesting any other operations. (not a problem on MySql InnerJoin, my bad).

Also you should verify that you have all keys defragmented, and enough ram free for the index to load in memory

Update 1:


Try to add a composed index for match_id,hero_id as it should give better performance.

ALTER TABLE `matches_heroes` ADD KEY `match_id_hero_id` (`match_id`,`hero_id`)


Update 2:


I wasn't satisfied with the accepted answer, that mysql is that slow for just 2 mill records and I runed benchmarks on my ubuntu PC (i7 processor, with standard HDD).

-- pre-requirements

CREATE TABLE seq_numbers (
    number INT NOT NULL
) ENGINE = MYISAM;


DELIMITER $$
CREATE PROCEDURE InsertSeq(IN MinVal INT, IN MaxVal INT)
    BEGIN
        DECLARE i INT;
        SET i = MinVal;
        START TRANSACTION;
        WHILE i <= MaxVal DO
            INSERT INTO seq_numbers VALUES (i);
            SET i = i + 1;
        END WHILE;
        COMMIT;
    END$$
DELIMITER ;

CALL InsertSeq(1,200000)
;

ALTER TABLE seq_numbers ADD PRIMARY KEY (number)
;

--  create tables

-- DROP TABLE IF EXISTS `matches`
CREATE TABLE `matches` (
`match_id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`start_time` int(10) unsigned NOT NULL,
PRIMARY KEY (`match_id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8
;

CREATE TABLE `heroes` (
`hero_id` smallint(5) unsigned NOT NULL AUTO_INCREMENT,
`name` char(40) NOT NULL,
PRIMARY KEY (`hero_id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8
;

CREATE TABLE `matches_heroes` (
`match_id` bigint(20) unsigned NOT NULL,
`hero_id` smallint(6) unsigned NOT NULL,
PRIMARY KEY (`match_id`,`hero_id`),
KEY (match_id),
KEY (hero_id),
CONSTRAINT `matches_heroes_ibfk_2` FOREIGN KEY (`hero_id`) REFERENCES `heroes` (`hero_id`),
CONSTRAINT `matches_heroes_ibfk_1` FOREIGN KEY (`match_id`) REFERENCES `matches` (`match_id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=MyISAM DEFAULT CHARSET=utf8
;
-- insert DATA
-- 100
INSERT INTO heroes(name)
SELECT SUBSTR(CONCAT(char(RAND()*25+65),char(RAND()*25+97),char(RAND()*25+97),char(RAND()*25+97),char(RAND()*25+97),char(RAND()*25+97),char(RAND()*25+97),char(RAND()*25+97),char(RAND()*25+97),char(RAND()*25+97),char(RAND()*25+97),char(RAND()*25+97)),1,RAND()*9+4) as RandomName
FROM seq_numbers WHERE number <= 100

-- 200000
INSERT INTO matches(start_time)
SELECT rand()*1000000
FROM seq_numbers WHERE number <= 200000

-- 2000000
INSERT INTO matches_heroes(hero_id,match_id)
SELECT a.hero_id, b.match_id
FROM heroes as a
INNER JOIN matches as b ON 1=1
LIMIT 2000000

-- warm-up database, load INDEXes in ram (optional, works only for MyISAM tables)
LOAD INDEX INTO CACHE matches_heroes,matches,heroes


-- get random hero_id
SET @randHeroId=(SELECT hero_id FROM matches_heroes ORDER BY rand() LIMIT 1);


-- test 1 

SELECT SQL_NO_CACHE @randHeroId,COUNT(*) AS match_count
FROM matches as a 
INNER JOIN matches_heroes as b ON a.match_id = b.match_id
WHERE b.hero_id = @randHeroId
; -- Time: 0.039s


-- test 2: adding some complexity 
SET @randName = (SELECT `name` FROM heroes WHERE hero_id = @randHeroId LIMIT 1);

SELECT SQL_NO_CACHE @randName, COUNT(*) AS match_count
FROM matches as a 
INNER JOIN matches_heroes as b ON a.match_id = b.match_id
INNER JOIN heroes as c ON b.hero_id = c.hero_id
WHERE c.name = @randName
; -- Time: 0.037s

Conclusion: The test results are about 20x faster, and my server load was about 80% before testing as it's not a dedicated mysql server and had other cpu intensive tasks running, so if you run the whole script (from above) and get lower results it can be because:

  1. you have a shared host, and the load is too big. In this case there isn't much you can do: you either complain to your current host, pay for a better host/vm or try another host
  2. your configured key_buffer_size(for MyISAM) or innodb_buffer_pool_size(for innoDB) is too small, the optimum size would be over 150MB
  3. your available ram is not enough, you would require about 100 - 150 mb of ram for the indexes to be loaded into memory. solution: free up some ram or buy more of it

Note that by using the test script, the generating of new data rules out the index fragmentation problem. Hope this helps, and ask if you have issues in testing this.


obs:


SELECT SQL_NO_CACHE COUNT(*) AS match_count 
FROM matches INNER JOIN matches_heroes ON matches.match_id = matches_heroes.match_id 
WHERE hero_id = 5` 

is the equivalent to:

SELECT SQL_NO_CACHE COUNT(*) AS match_count 
FROM matches_heroes 
WHERE hero_id = 5` 

So you wouldn't require a join, if that's the count you need, but I'm guessing that was just an example.

Upvotes: 1

Thorsten Kettner
Thorsten Kettner

Reputation: 94859

So you say reading a table of 200,000 records is faster than reading a table of 2,000,000 records, finding the desired ones, then take them all to find matching records in the 200,000 record table?

And this surprises you? It's simply a lot of more work for the dbms. (It can even be, btw, that the dbms decides not to use the hero_id index when it considers a full table scan to be faster.)

So in my opinion there is nothing wrong with what is happening here.

Upvotes: 1

Related Questions