axldns
axldns

Reputation: 55

Highscores on multiple columns, efficient query, right approach

Let's say we've got high scores table with columns app_id, best_score, best_time, most_drops, longest_something and couple more. I'd like to collect top three results ON EACH CATEGORY grouped by app_id?

For now I'm using separate rank queries on each category in a loop:

SELECT app_id, best_something1,
FIND_IN_SET( best_something1, 
(SELECT GROUP_CONCAT( best_something1 
ORDER BY best_something1 DESC) 
FROM highscores )) AS rank 
FROM highscores 
ORDER BY best_something1 DESC LIMIT 3;

Two things worth to add:

  1. All columns for specific app are being updated at the same time (can consider creating a helper table).
  2. the result of prospective "turbo query" might be requested quite often - as often as updating the values.

I'm quite basic with SQL and suspect that it has many more commands that combined together could do the magic?

What I'd expect from this post is that some wise owl would at least point the direction where to go or how to go.

The sample table: http://sqlfiddle.com/#!2/eef053/1

Here is sample result too (already in json format, sry):

{"total_blocks":[["13","174","1"],["9","153","2"],["10","26","3"]],"total_games":[["13","15","1"],["9","12","2"],["10","2","3"]],"total_score":[["13","410","1"],["9","332","2"],["11","88","3"]],"aver_pps":[["11","4.34011","1"],["13","2.64521","2"],["12","2.60623","3"]],"aver_drop_per_game":[["11","20","1"],["10","13","2"],["9","12.75","3"]],"aver_drop_val":[["11","4.4","1"],["13","2.35632","2"],["9","2.16993","3"]],"aver_score":[["11","88","1"],["9","27.6667","2"],["13","27.3333","3"]],"best_pps":[["13","4.9527","1"],["11","4.34011","2"],["9","4.13076","3"]],"most_drops":[["11","20","1"],["9","16","2"],["13","16","2"]],"longest_drop":[["9","3","1"],["13","2","2"],["11","2","2"]],"best_drop":[["11","42","1"],["13","36","2"],["9","30","3"]],"best_score":[["11","88","1"],["13","78","2"],["9","58","3"]]}

Upvotes: 0

Views: 111

Answers (2)

spencer7593
spencer7593

Reputation: 108390

I'd test something like this query, to see if the performance is any better or not. I think this comes pretty close to satisfying the specification:

 ( SELECT 99 AS seq_
        , a.category
        , CONVERT(a.val,DOUBLE) AS val
        , FIND_IN_SET(a.val,r.highest_vals) AS rank
        , a.user_id
     FROM ( SELECT 'total_blocks' AS category
                 , b.`total_blocks` AS val
                 , b.user_id
              FROM app b
             ORDER BY b.`total_blocks` DESC
             LIMIT 3
          ) a
    CROSS
     JOIN ( SELECT GROUP_CONCAT(s.val ORDER BY s.val DESC) AS highest_vals
              FROM ( SELECT t.`total_blocks` AS val
                       FROM app t
                      ORDER BY t.`total_blocks` DESC
                      LIMIT 3
                   ) s
          ) r
    ORDER BY a.val DESC
 )
 UNION ALL
 ( SELECT 97 AS seq_
        , a.category
        , CONVERT(a.val,DOUBLE) AS val
        , FIND_IN_SET(a.val,r.highest_vals) AS rank
        , a.user_id
     FROM ( SELECT 'XXX' AS category
                 , b.`XXX` AS val
                 , b.user_id
              FROM app b
             ORDER BY b.`XXX` DESC
             LIMIT 3
          ) a
    CROSS
     JOIN ( SELECT GROUP_CONCAT(s.val ORDER BY s.val DESC) AS highest_vals
              FROM ( SELECT t.`XXX` AS val
                       FROM app t
                      ORDER BY t.`XXX` DESC
                      LIMIT 3
                   ) s
          ) r
    ORDER BY a.val DESC
 )
 ORDER BY seq_ DESC, val DESC

To unpack this a little bit... this is essentially separate queries that are combined with UNION ALL set operator.

Each of the queries returns a literal value to allow for ordering. (In this case, I've given the column a rather anonymous name seq_ (sequence)... if the specific order isn't important, then this could be removed.

Each query is also returning a literal value that tells which "category" the row is for.

Because some of the values returned are INTEGER, and others are FLOAT, I'd cast all of those values to floating point, so the datatypes of each query line up.

For the FLOAT (floating point) type values, there can be a problem with comparison. So I'd go with casting those to decimal and stringing them together into a list using GROUP_CONCAT (as the original query does).

Since we are returning only three rows from each query, we only need to concatenate together the three largest values. (If there's a two way "tie" for first place, we'll return rank values of 1, 1, 3.)

Suitable indexes for each query will improve performance for large sets.

... ON app (total_blocks, user_id)
... ON app (best_pps,user_id)
... ON app (XXX,user_id)

Upvotes: 1

Christopher McGowan
Christopher McGowan

Reputation: 1391

When I encounter this scenario, I prefer to employ the UNION clause, and combine the queries tailored to each ORDERing and LIMIT.

http://dev.mysql.com/doc/refman/5.1/en/union.html

UNION combines the result rows vertically (top 3 rows for 5 sort categories yields 15 rows).

For your specific purpose, you might then pivot them as sub-SELECTs, rolling them up with GROUP_CONCAT GROUPed on user so that each has the delimited list.

Upvotes: 2

Related Questions