Jordy
Jordy

Reputation: 1049

Caching big data, alternative query or other indexes?

I'm with a problem, I am working on highscores, and for those highscores you need to make a ranking based on skill experience and latest update time (to see who got the highest score first incase skill experience is the same).

The problem is that with the query I wrote, it takes 28 (skills) x 0,7 seconds to create a personal highscore page to see what their rank is on the list. Requesting this in the browser is just not doable, it takes way too long for the page to load and I need a solution for my issue.

MySQL version: 5.5.47

The query I wrote:

SELECT rank FROM  
    (
        SELECT hs.playerID, (@rowID := @rowID + 1) AS rank 
        FROM 
            (
                SELECT hs.playerID 
                FROM highscores AS hs
                INNER JOIN overall AS o  ON hs.playerID = o.playerID 
                WHERE hs.skillID = ?
                  AND o.game_mode = ? 
                ORDER BY hs.skillExperience DESC,
                         hs.updateTime ASC
            ) highscore,
        (SELECT @rowID := 0) r
    ) data
WHERE data.playerID = ?

As you can see I first have to create a whole resultset that gives me a full ranking for that game mode and skill, and then I have to select the rank based on the playerID after that, the problem is that I cannot let the query run untill it finds the result, because mysql doesn't offer such function, if I'd specifiy where data.playerID = ? in the query above, it would give back 1 result, meaning the ranking will be 1 as well.

The highscores table has 550k rows

What I have tried was storing the resultset for each skillid/gamemode combination in a temp table json_encoded, tried storing on files, but it ended up being quite slow as well, because the files are really huge and it takes time to process.

Highscores table:

CREATE TABLE `highscores` (
    `playerID` INT(11) NOT NULL,
    `skillID` INT(10) NOT NULL,
    `skillLevel` INT(10) NOT NULL,
    `skillExperience` INT(10) NOT NULL,
    `updateTime` BIGINT(20) NOT NULL,
    PRIMARY KEY (`playerID`, `skillID`)
)
COLLATE='utf8_general_ci'
ENGINE=MyISAM;

Overall table has got 351k rows

Overall table:

CREATE TABLE `overall` (
    `playerID` INT(11) NOT NULL,
    `playerName` VARCHAR(50) NOT NULL,
    `totalLevel` INT(10) NOT NULL,
    `totalExperience` BIGINT(20) NOT NULL,
    `updateTime` BIGINT(20) NOT NULL,
    `game_mode` ENUM('REGULAR','IRON_MAN','IRON_MAN_HARDCORE') NOT NULL DEFAULT 'REGULAR',
    PRIMARY KEY (`playerID`, `playerName`)
)
COLLATE='utf8_general_ci'
ENGINE=MyISAM;

Explain Select result from the query:

enter image description here

Does anybody have a solution for me?

Upvotes: 0

Views: 56

Answers (3)

Rick James
Rick James

Reputation: 142296

No useful index for WHERE

The last 2 lines of the EXPLAIN (#3 DERIVED):

           WHERE hs.skillID = ?
             AND o.game_mode = ? 

Since neither table has a suitable index to use for the WHERE clause, to optimizer decided to do a table scan of one of them (overall), then reach into the other (highscores). Having one of these indexes would help, at least some:

highscores: INDEX(skillID)
overall: INDEX(game_mode, ...) -- note that an index only on a low-cardinality ENUM is rarely useful.

(More in a minute.)

No useful index for ORDER BY

The optimizer sometimes decides to use an index for the ORDER BY instead of for the WHERE. But

            ORDER BY hs.skillExperience DESC,
                     hs.updateTime      ASC

cannot use an index, even though both are in the same table. This is because DESC and ASC are different. Changing ASC to DESC would have an impact on the resultset, but would allow

INDEX(skillExperience, updateTime)

to be used. Still, this may not be optimal. (More in a minute.)

Covering index

Another form of optimization is to build a "covering index". That is an index that has all the columns that the SELECT needs. Then the query can be performed entirely in the index, without reaching over to the data. The SELECT in question is the innermost:

              ( SELECT  hs.playerID
                    FROM  highscores AS hs
                    INNER JOIN  overall AS o ON hs.playerID = o.playerID
                    WHERE  hs.skillID = ?
                      AND  o.game_mode = ?
                    ORDER BY  hs.skillExperience DESC, hs.updateTime ASC 
              ) highscore,

For hs: INDEX(skillID, skillExperience, updateTime, playerID) is "covering" and has the most important item (skillID, from the WHERE) first.

For o: INDEX(game_mode, playerID) is "covering". Again, game_mode must be first.

If you change the ORDER BY to be DESC and DESC, then add another index for hs: INDEX(skillExperience, updateTime, skillID, playerID). Now the first 2 columns must be in that order.

Conclusion

It is not obvious which of those indexes the optimizer would prefer. I suggest you add both and let it choose.

I believe that (1) the innermost query is consuming the bulk of time, and (2) there is nothing to optimize in the outer SELECTs. So, I leave that as my recommendation.

Much of this is covered in my Indexing Cookbook.

Upvotes: 1

Ogre Codes
Ogre Codes

Reputation: 19621

It doesn't look like you really need to rank everyone, you just want to find out how many people are ahead of the current player. You should be able to get a simple count of how many players have better scores & dates than the current player which represents the current player's ranking.

    SELECT count(highscores.id) as rank FROM highscores
       join highscores playerscore 
           on playerscore.skillID = highscores.skillID
          and playerscore.gamemode  = highscores.gamemode
      where highscores.skillID = ? 
        AND highscores.gamemode = ?
        and playerscore.playerID = ?
        and (highscores.skillExperience > playerscore.skillExperience
          or (highscores.skillExperience = playerscore.skillExperience
             and highscores.updateTime > playerscore.updateTime));

(I joined the table to itself and aliased the second instance as playerscore so it was slightly less confusing)

You could probably even simplify it to one query by grouping and parsing the results within your language of choice.

    SELECT 
        highscores.gamemode as gamemode,
        highscores.skillID as skillID, 
        count(highscores.id) as rank 
       FROM highscores
       join highscores playerscore 
           on playerscore.skillID = highscores.skillID
          and playerscore.gamemode  = highscores.gamemode
      where playerscore.playerID = ? 
        and (highscores.skillExperience > playerscore.skillExperience
          or (highscores.skillExperience = playerscore.skillExperience
             and highscores.updateTime > playerscore.updateTime));
      group by highscores.gamemode, highscores.skillID;

Not quite sure about the grouping bit though.

Upvotes: 0

Deep
Deep

Reputation: 2512

Important subanswer: How frequently change rank of all players? Hmm.. Need explain.. You want realtime statistics? No, you dont want realtime )) You must select time interval for update statistics, e.g. 10 minutes. For this case you can run cronjob for insert new rank statistics into separated table like this:

/* lock */
TRUNCATE TABLE rank_stat; /* maybe update as unused/old for history) instead truncate */
INSERT INTO rank_stat (a, b, c, d) <your query here>;
/* unlock */

and users (browsers) will select readonly statistics from this table (can be split to pages).

But if rank stat not frequently change, e.g. you can recalculate it for all wanted game events and/or acts/achievs of players.

This is recommedations only. Because you not explain full environment. But I think you can found right solution with this recommendations.

Upvotes: 0

Related Questions