Calculate quantiles for a score/ranking system (PHP / MySQL)

There are two tables:

Table user:

+----+-----------+
| id | user_name |
+----+-----------+
|  1 |   Alice   |
|  2 |   Steve   |
|  3 |   Tommy   |
+----+-----------+

Table result:

+----+---------+-------+-------------+
| id | user_id | score |  timestamp  |
+----+---------+-------+-------------+
|  1 |    1    |   22  |  1410793838 |
|  2 |    1    |   16  |  1410793911 |
|  3 |    2    |    9  |  1410793920 |
|  4 |    1    |   27  |  1410794007 |
|  5 |    3    |   32  |  1410794023 |
+----+---------+-------+-------------+

What I have so far is a "top 3", which works great and looks like this:

SELECT MAX(m.score) AS score, u.user_name
FROM result AS r
INNER JOIN user AS u ON r.user_id = u.id
GROUP BY r.user_id
ORDER BY r.score DESC
LIMIT 3;

+-------+-----------+
| score | user_name |
+-------+-----------+
|   32  |   Tommy   |
|   27  |   Alice   |
|    9  |   Steve   |
+-------+-----------+

The table is actually filled with hundreds of results, this is just an example. I'm looking for a compact algorithm to get the rank of a specific user in relation to all other users in %. The goal is to output something like "you are in the top 5%/10%/20%/50%" or "you are below average". While it's easy to determine if someone is below average (score < AVG(score)), I have no clue how to determine the other ranks.

Upvotes: 2

Views: 1875

Answers (3)

Mihai
Mihai

Reputation: 26784

Here is another version

    SELECT user_name, score,(CASE 
        WHEN score BETWEEN  @max-((@max-@min)/10) AND @max THEN  '10'
        WHEN score BETWEEN  @max-((@max-@min)/5) AND @max THEN  '20'
        WHEN score BETWEEN  @max-((@max-@min)/2) AND @max THEN  '50'
        ELSE 'more50' 
        END) as rangescore,
user_name 
FROM result r
INNER JOIN user u ON r.user_id = u.id,
(SELECT @max :=  MAX(score) FROM result)x,
(SELECT @min :=  MIN(score) FROM result)y
ORDER BY score DESC

You can use AVG(score) instead of MAX f you want to compare the average score of the user.

Or remove the aggregate functions and GROUP BY if you want every score.

FIDDLE

FIDDLE GROUP

Upvotes: 1

Raphael M&#252;ller
Raphael M&#252;ller

Reputation: 2200

Ok, now with your new declarations, I will provide you another solution:

As you said you can use PHP and MySQL togheter, I provide you a combined one.

You want to compute your quantiles (to have an idea what this is quantiles on wikipedia), because if you have a top scorer with about 10000 points and all other players just have 100 points and below, the ones with 100 points are in the top 5% as player, but there score is under 50% of the top scorer.

With this in mind, we can count the number of players, get the score at the wanted percent of players and compare, were the players score fits in.

First select all the maximas, minimas, counter etc.

SELECT
 COUNT(`result`.`score`) `count`,
 MAX(`result`.`score`) `max`,
 MIN(`result`.`score`) `min`,
 AVG(`result`.`score`) `avg`
FROM
 `result`
GROUP BY
 `result`.`user_id`
ORDER BY
 `result`.`score` DESC

After you have the complete data, you can compute your quantiles.

SELECT
 `result`.`score`
FROM
 `result`
GROUP BY
 `result`.`user_id`
ORDER BY
 `result`.`score` DESC
LIMIT FLOOR($count*$percent), 1
//where $count is the value from the first query and $percent is the wanted quantile e.g. 5%

After that you know the value for your quantile and you can compare the actual value with the ones here.

//where $percentNN is the score from the previous query
if($score > $percent50) echo "top 50%";
if($score > $percent20) echo "top 20%";
if($score > $percent10) echo "top 10%";
if($score > $percent5) echo "top 5%";

maybe, we can combine the multiple queries to one.

Upvotes: 0

Alma Do
Alma Do

Reputation: 37365

If I got all correctly, it's just relative maximum calculation:

SELECT
  user_name,
  MAX(score) AS max_score,
  CASE
    WHEN ROUND(100*MAX(score)/maximum, 2)>=95 THEN 'In top 5%'
    WHEN ROUND(100*MAX(score)/maximum, 2)>=90 THEN 'In top 10%'
    WHEN ROUND(100*MAX(score)/maximum, 2)>=75 THEN 'In top 25%'
    WHEN ROUND(100*MAX(score)/maximum, 2)>=50 THEN 'In top 50%'
    WHEN ROUND(100*MAX(score)/maximum, 2)>=0 THEN 'Below average'
  END AS score_mark
FROM
  `result`
    INNER JOIN `user`
      ON `result`.user_id=`user`.id
    CROSS JOIN
      (SELECT MAX(score) AS maximum FROM `result`) AS init
GROUP BY
  user_id

So, counting from maximum score per all table and grouping it for specific user. Check the fiddle.

As mentioned below, this counting method involves simple way to determine average (i.e. all it's based on total maximum). This may be not the thing which is needed. By that I mean, that if question is about calculation relative position according to other scores (not maximum) - then it's more complicated:

  SELECT
    maxs.*,
    @num:=@num+1 AS order_num,
    CASE
      WHEN 100*(@num-1)/(user_count-1) <=   5 THEN 'In top 5%'
      WHEN 100*(@num-1)/(user_count-1) <=  10 THEN 'In top 10%'
      WHEN 100*(@num-1)/(user_count-1) <=  25 THEN 'In top 25%'
      WHEN 100*(@num-1)/(user_count-1) <=  50 THEN 'In top 50%'
      WHEN 100*(@num-1)/(user_count-1) <= 100 THEN 'Below average'
    END AS score_mark
  FROM
    (SELECT
      user_name,
      MAX(score) AS max_score
    FROM
      `result`
        INNER JOIN `user`
          ON `result`.user_id = `user`.id
    GROUP BY
      user_id
    ORDER BY
      max_score DESC) AS maxs
    CROSS JOIN
      (SELECT 
        @num:=0,
        COUNT(DISTINCT user_id) AS user_count
      FROM
        `result`) AS init

-since now we must first re-count our positions and later build relative calculations over that. Here is the corresponding fiddle. Here, however, I'm applying linear formula to count 1-st position as "zero" and last position as "100". If that's not an intention (there will be edge-cases, like "3" in "50%" for "5 total" in fiddle) - then you may change divisor to user_count

Upvotes: 2

Related Questions