DevilaN
DevilaN

Reputation: 1464

Using MEDIAN with GROUP BY

As of MariaDB 10.3.3 there exist MEDIAN function. Unfortunately there is a little problem when I try to use it with GROUP BY statement (currently using v10.3.9).

Given following table:

CREATE TABLE testmed
  (
     id       INT NOT NULL auto_increment,
          PRIMARY KEY(id),
     group_id INT NOT NULL DEFAULT 0,
     score    INT NOT NULL DEFAULT 0
  ); 

Filling it up with some data:

INSERT INTO testmed (group_id, score) 
VALUES (1,1), (1,2), (1,2), (1,2), (1,3), (2,5), (2,7), (2,9), (2,11), (2,11);

Now I am getting different results with and without GROUP BY in query:

MariaDB [test]> SELECT group_id, score, MEDIAN(score) OVER (PARTITION BY group_id) FROM testmed;
+----------+-------+--------------------------------------------+
| group_id | score | MEDIAN(score) OVER (PARTITION BY group_id) |
+----------+-------+--------------------------------------------+
|        1 |     1 |                               2.0000000000 |
|        1 |     2 |                               2.0000000000 |
|        1 |     2 |                               2.0000000000 |
|        1 |     2 |                               2.0000000000 |
|        1 |     3 |                               2.0000000000 |
|        2 |     5 |                               9.0000000000 |
|        2 |     7 |                               9.0000000000 |
|        2 |     9 |                               9.0000000000 |
|        2 |    11 |                               9.0000000000 |
|        2 |    11 |                               9.0000000000 |
+----------+-------+--------------------------------------------+
10 rows in set (0.000 sec)
MariaDB [test]> SELECT group_id, score, MEDIAN(score) OVER (PARTITION BY group_id) FROM testmed GROUP BY group_id;
+----------+-------+--------------------------------------------+
| group_id | score | MEDIAN(score) OVER (PARTITION BY group_id) |
+----------+-------+--------------------------------------------+
|        1 |     1 |                               1.0000000000 |
|        2 |     5 |                               5.0000000000 |
+----------+-------+--------------------------------------------+

First one is correct, but why it is not working properly with GROUP BY. Currently I am using query nesting like that:

MariaDB [test]> SELECT * FROM (SELECT group_id, score, MEDIAN(score) OVER (PARTITION BY group_id) FROM testmed) t GROUP BY group_id;
+----------+-------+--------------------------------------------+
| group_id | score | MEDIAN(score) OVER (PARTITION BY group_id) |
+----------+-------+--------------------------------------------+
|        1 |     1 |                               2.0000000000 |
|        2 |     5 |                               9.0000000000 |
+----------+-------+--------------------------------------------+
2 rows in set (0.000 sec)

but it feels so wrong doing it that way.

What is right way to do it?

Upvotes: 3

Views: 5226

Answers (1)

Tim Biegeleisen
Tim Biegeleisen

Reputation: 521249

Your second query is technically invalid:

SELECT
    group_id,
    score,
    MEDIAN(score) OVER (PARTITION BY group_id)
FROM testmed
GROUP BY group_id;

The reason it is invalid is because you are selecting score which does not appear in the GROUP BY clause. The issue here is which value of score do you intend the database to use for each group_id? What appears to be happening here is that MariaDB is arbitrarily choosing the minimum value of score. But since there is only a single score value, the median just returns that single value.

Keep in mind that analytic functions are evaluated after the GROUP BY aggregation takes place. I think this is the query you were intending to run:

SELECT DISTINCT
    group_id,
    MEDIAN(score) OVER (PARTITION BY group_id) score_median
FROM testmed;

If this doesn't work, because MariaDB doesn't like using DISTINCT with MEDIAN, then you can try subquerying:

SELECT DISTINCT
    group_id,
    score_median
FROM
(
    SELECT
        group_id,
        MEDIAN(score) OVER (PARTITION BY group_id) score_median
    FROM testmed
) t;

Upvotes: 2

Related Questions