Richard Smith
Richard Smith

Reputation: 3437

Calculating median with a GROUP BY in MariaDB

I'm running MariaDB 10.3 which supports the MEDIAN function, and in simple cases this function seems to work well. However I'm having trouble making it work in complex situations. Currently I'm trying to calculate the number, mean and median value for each of several types of item. Here's a MWE:

CREATE TABLE items (type INTEGER, value INTEGER);
INSERT INTO items (type, value) VALUES (1,1), (1,3), (1,4), (2,10), (2,32), (2,42);
SELECT type, COUNT(*) AS count, AVG(value) AS mean, 
  MEDIAN(value) OVER(PARTITION BY type) AS median FROM items GROUP BY type;
DROP TABLE items;

This calculates the count and mean correctly, but not the median:

type count mean median
1 3 2.6667 1.0000000000
2 3 28.0000 10.0000000000

I expect it's some silly error in my SQL, but I can't for the life of me see it, and no errors or warnings are reported. I've tried dropping the PARTITION BY type which changes the medians returned but they're still not correct. Removing the OVER clause results in a syntax error.

I don't understand why this isn't working. If I just do the following, the median is calculated correctly, though with lots of duplicate lines, but adding GROUP BY type gives the erroneous values above.

SELECT type, MEDIAN(value) OVER(PARTITION BY type) AS median FROM items;

Upvotes: 0

Views: 534

Answers (1)

forpas
forpas

Reputation: 164064

This query:

SELECT type, 
       COUNT(*) AS count, 
       AVG(value) AS mean 
FROM items 
GROUP BY type;

is an aggregation query and if you add to it the window function MEDIAN() it will be calculated on the results that it returns which are:

type count mean
1 3 2.6667
2 3 28.0000

with an arbitrary value for value (which seems to be the first of each partition).

What you should do is get the median in a subquery and then aggregate:

SELECT type, 
       COUNT(*) AS count, 
       AVG(value) AS mean,
       median
FROM (       
  SELECT *, MEDIAN(value) OVER (PARTITION BY type) AS median 
  FROM items
) t
GROUP BY type, median;

Or, use only window functions:

SELECT DISTINCT type, 
       COUNT(*) OVER (PARTITION BY type) AS count, 
       AVG(value) OVER (PARTITION BY type) AS mean,
       MEDIAN(value) OVER (PARTITION BY type) AS median
FROM items;

See the demo.

Upvotes: 1

Related Questions