davr
davr

Reputation: 19137

Simple way to calculate median with MySQL

What's the simplest (and hopefully not too slow) way to calculate the median with MySQL? I've used AVG(x) for finding the mean, but I'm having a hard time finding a simple way of calculating the median. For now, I'm returning all the rows to PHP, doing a sort, and then picking the middle row, but surely there must be some simple way of doing it in a single MySQL query.

Example data:

id | val
--------
 1    4
 2    7
 3    2
 4    2
 5    9
 6    8
 7    3

Sorting on val gives 2 2 3 4 7 8 9, so the median should be 4, versus SELECT AVG(val) which == 5.

Upvotes: 274

Views: 414840

Answers (30)

Penny Liu
Penny Liu

Reputation: 17388

My response isn't the shortest, but it's clear and easy to follow.

Here’s a detailed description of the SQL query using a Common Table Expression (CTE) to calculate the median:

Input Format

The student table is described as follows:

Sample Input

RankedData CTE

The RankedData CTE generates a table where each grade is ranked by its order, and the total number of rows in the dataset is calculated.

WITH RankedData AS (
    SELECT
        grade,
        ROW_NUMBER() OVER (ORDER BY grade) AS row_num,
        COUNT(*) OVER () AS total_rows
    FROM student
)

Output of RankedData:

RankedData

Main Query

The main query calculates the median using a CASE statement to handle two scenarios:

SELECT
    ROUND(
            CASE
                WHEN total_rows % 2 = 1 THEN
                    -- Odd number, pick the middle one
                    (SELECT grade FROM RankedData WHERE row_num = (total_rows + 1) / 2)
                ELSE
                    -- Even number, average the two middle ones
                    (SELECT AVG(grade)
                     FROM RankedData
                     WHERE row_num IN (total_rows / 2, total_rows / 2 + 1))
                END,
            2
    ) AS median_grade
FROM RankedData
LIMIT 1;

Query Output

Query Output

DataGrip

Upvotes: 0

Mahoor13
Mahoor13

Reputation: 5617

A different way to calculate the Median is using JSON functions in MySQL 5.7+, 8+ and MariaDB 10.2+.

This is my stored function tested in MySQL 8.0:

CREATE FUNCTION JSON_MEDIAN(input_json JSON)
RETURNS FLOAT NO SQL
BEGIN
    DECLARE median FLOAT;
    DECLARE middle INT;
    DECLARE arr_length INT;
    DECLARE peek_count INT;
    
    -- count non-empty items
    SELECT COUNT(*) INTO arr_length
    FROM JSON_TABLE(input_json, '$[*]' COLUMNS (item FLOAT PATH '$')) s1
    WHERE item IS NOT NULL;
    -- peek 1 item if length is odd or 2 items if length is even
    SET peek_count = 2 - arr_length % 2;
    SET middle = CEIL(arr_length / 2) - 1;
    
    SELECT AVG(item) INTO median 
    FROM (
        SELECT item
        FROM JSON_TABLE(input_json, '$[*]' COLUMNS (item FLOAT PATH '$')) s1
        WHERE item IS NOT NULL
        ORDER BY item
        LIMIT middle, peek_count
    ) s2;
    
    RETURN median;
END

You can now use this function with JSON arrays containing numeric items or create the input using the JSON_ARRAYAGG function like this:

SELECT JSON_MEDIAN(JSON_ARRAYAGG(`val`))
FROM `my_table`

This method does not have GROUP_CONCAT limits.

Upvotes: 3

StanislavKo
StanislavKo

Reputation: 410

If you need median per group then use "PARTITION BY" in ROW_NUMBER() OVER (...)

WITH Numbered AS 
(
  SELECT groupingColumn, 
  val,
  COUNT(*) OVER (partition by groupingColumn) AS Cnt,
  ROW_NUMBER() OVER (partition by groupingColumn ORDER BY val) AS RowNum
  FROM yourtable
)
SELECT groupingColumn, val
FROM Numbered
WHERE RowNum IN ((Cnt+1)/2, (Cnt+2)/2)
ORDER BY groupingColumn
;

Upvotes: 3

bhawesh mehta
bhawesh mehta

Reputation: 1

Let's create a sample table named numbers

This Answer is specific to the MySQL database

In PostgresSql its simple use per_cont function

CREATE TABLE numbers(
  num INT,
  frequency INT
);

Insert values in numbers table

INSERT INTO numbers VALUES  
        (0,7),
        (1,1),
        (2,3),
        (3,1),
        (9,1),
        (1,1),
        (2,3),
        (3,1),
        (9,1);

-- select * from numbers 

WITH RECURSIVE num_frequency (num,frequency, i) AS 
(
SELECT  num,frequency,1
FROM   numbers
UNION ALL
SELECT num,frequency,i+1
FROM   num_frequency
WHERE  num_frequency.i < num_frequency.frequency
)

select * 
(max(case when numbers=lower_limit then num else null end)/2
+max(case when numbers=upper_limit then num else null end)/2) as median
from (
select *,total_number%2,
case 
when  total_number%2=0 then total_number/2
else  (total_number+1)/2 end as lower_limit,
case 
when  total_number%2=0 then total_number/2+1
else  (total_number+1)/2
end as upper_limit

from (
select *,max(numbers) over() as total_number from (
select num,row_number() over (order by num) 
as numbers from num_frequency
)b 
)b
)b

Upvotes: 0

TheJacobTaylor
TheJacobTaylor

Reputation: 4143

I just found another answer online in the comments:

For medians in almost any SQL:

SELECT x.val from data x, data y
GROUP BY x.val
HAVING SUM(SIGN(1-SIGN(y.val-x.val))) = (COUNT(*)+1)/2

Make sure your columns are well indexed and the index is used for filtering and sorting. Verify with the explain plans.

select count(*) from table --find the number of rows

Calculate the "median" row number. Maybe use: median_row = floor(count / 2).

Then pick it out of the list:

select val from table order by val asc limit median_row,1

This should return you one row with just the value you want.

Upvotes: 80

jentek llc
jentek llc

Reputation: 71

You can use window function row_number() to answer the query to find medium

select val 
from (select val, row_number() over (order by val) as rownumber, x.cnt 
from data, (select count(*) as cnt from data) x) abc
where rownumber=ceil(cnt/2);

Upvotes: 1

AKASH SINGH
AKASH SINGH

Reputation: 31

I am using the below table for the solution in MySQL:

CREATE TABLE transactions (
  transaction_id int , user_id int , merchant_name varchar(255), transaction_date date , amount int
);

INSERT INTO transactions (transaction_id, user_id, merchant_name, transaction_date, amount)  
VALUES (1, 1 ,'abc', '2015-08-17', 100),(2, 2, 'ced', '2015-2-17', 100),(3, 1, 'def', '2015-2-16', 121),
(4, 1 ,'ced', '2015-3-17', 110),(5, 1, 'ced', '2015-3-17', 150),(6, 2 ,'abc', '2015-4-17', 130), 
(7, 3 ,'ced', '2015-12-17', 10),(8, 3 ,'abc', '2015-8-17', 100),(9, 2 ,'abc', '2015-12-17', 140),(10, 1,'abc', '2015-9-17', 100),
(11, 1 ,'abc', '2015-08-17', 121),(12, 2 ,'ced', '2015-12-23', 130),(13, 1 ,'def', '2015-12-23', 13),(3, 4, 'abc', '2015-2-16', 120),(3, 4, 'def', '2015-2-16', 121),(3, 4, 'ced', '2015-2-16', 121);

Calculating Median for 'amount' column:

WITH Numbered AS 
(
SELECT *, COUNT(*) OVER () AS TotatRecords,
    ROW_NUMBER() OVER (ORDER BY amount) AS RowNum
FROM transactions
)
SELECT Avg(amount)
FROM Numbered
WHERE RowNum IN ( FLOOR((TotatRecords+1)/2), FLOOR((TotatRecords+2)/2) )
;

TotalRecords = 16 and Median = 120.5000

This query will work for both the conditions i.e. Even and Odd records.

Upvotes: 3

Nikhil Agarwal
Nikhil Agarwal

Reputation: 21

For a table station and column lat_n, here is MySQL code to get the median:

set @rows := (select count(1) from station);
set @v1 := 0;
set @sql1 := concat('select lat_n into @v1 from station order by lat_n asc limit 1 offset ', ceil(@rows/2) - 1);
prepare statement1 from @sql1;
execute statement1;
set @v2 := 0;
set @sql2 := concat('select lat_n into @v2 from station order by lat_n asc limit 1 offset ', ceil((@rows + 1)/2) - 1);
prepare statement2 from @sql2;
execute statement2;
select (@v1 + @v2)/2;

Upvotes: 0

Cameron
Cameron

Reputation: 41

If this is MySQL, there are window functions now and you can just do it this way (assuming you want to round up to nearest integer - otherwise just replace ROUND with CEIL or FLOOR or what have you). The following solution works for tables regardless of whether they have an even number of rows or an odd number of rows:


WITH CTE AS (
    SELECT val,
            ROW_NUMBER() OVER (ORDER BY val ASC) AS rn,
            COUNT(*) OVER () AS total_count
    FROM data
)
SELECT ROUND(AVG(val)) AS median
FROM CTE
WHERE
    rn BETWEEN
    total_count / 2.0 AND
    total_count / 2.0 + 1;

I think some of the more recent answers on this thread were already getting at this approach, but it also seemed like people were overthinking it, so consider this an improved version. Regardless of SQL flavor, there is no reason anyone should be writing a huge paragraph of code with multiple subqueries just to get the median in 2021. However, please note that the above query only works if you're asked to find the median for a continuous series. Of course, regardless of row number, sometimes people do make a distinction between what is referred to as the Discrete Median and what is referred to as the Interpolated Median for a continuous series.

If you're asked to find the median for a discrete series and the table has an even number of rows, the above solution will not work for you, and you should revert to using one of the other solutions, like TheJacobTaylor's.

The second solution below is a slightly modified version of TheJacobTaylor's, where I explicitly state CROSS JOIN. This will work for tables that have an odd number of rows too, regardless of whether you're asked to find the median for a continuous or discrete series, but I would specifically use this when asked to find the median of a discrete series. Otherwise, use the first solution. That way, you'll never have to think about whether the data contains an 'even' or 'odd' number of data points.


SELECT x.val AS median
FROM data x
CROSS JOIN data y
GROUP BY x.val
HAVING SUM(SIGN(1 - SIGN(y.val - x.val))) = (COUNT(*) + 1) / 2;

Finally, you can easily do this in PostgreSQL using built-in functions. Here is a nice explanation, along with an effective summary on discrete vs interpolated medians.

https://leafo.net/guides/postgresql-calculating-percentile.html#calculating-the-median

Upvotes: 0

Gaurav Kumar
Gaurav Kumar

Reputation: 211

Simple Solution For ORACLE:

SELECT ROUND(MEDIAN(Lat_N), 4) FROM Station;

Easy Solution to Understand For MySQL:

select case MOD(count(lat_n),2) 
when 1 then (select round(S.LAT_N,4) from station S where (select count(Lat_N) from station where Lat_N < S.LAT_N ) = (select count(Lat_N) from station where Lat_N > S.LAT_N))
else (select round(AVG(S.LAT_N),4) from station S where 1 = (select count(Lat_N) from station where Lat_N < S.LAT_N ) - (select count(Lat_N) from station where Lat_N > S.LAT_N))
end from station;

Explanation

STATION is table name. LAT_N is the column name having numeric value

Suppose there are 101 records(odd number) in station table. This means that the median is 51st record if the tabled sorted either asc or desc.

In above query for every S.LAT_N of S table I am creating two tables. One for number of LAT_N values less than S.LAT_N and another for number of LAT_N values greater than S.LAT_N. Later I am comparing these two tables and if they are matched then I am selecting that S.LAT_N value. When I check for 51st records there are 50 values less than 51st record and there 50 records greater than 51st record. As you see, there are 50 records in both tables. So this is our answer. For every other record there are different number of records in two tables created for comparison. So, only 51st record meets the condition.

Now suppose there are 100 records(even number) in station table. This means that the median is average of 50th and 51st records if the tabled sorted either asc or desc.

Same as odd logic I am creating two tables. One for number of LAT_N values less than S.LAT_N and another for number of LAT_N values greater than S.LAT_N. Later I am comparing these two tables and if their difference is equal to 1 then I am selecting that S.LAT_N value and find the average. When I check for 50th records there are 49 values less than 50th record and there 51 records greater than 50th record. As you see, there is difference of 1 record in both tables. So this(50th record) is our 1st record for average. Similarly, When I check for 51st records there are 50 values less than 51st record and there 49 records greater than 51st record. As you see, there is difference of 1 record in both tables. So this(51st record) is our 2nd record for average. For every other record there are different number of records in two tables created for comparison. So, only 50th and 51st records meet the condition.

Upvotes: 2

Sudhanshu
Sudhanshu

Reputation: 732

The most simple and fast way to calculate median in mysql.

select x.col
from   (select lat_n, 
               count(1) over (partition by 'A')        as total_rows, 
               row_number() over (order by col asc) as rank_Order 
        from   station ft) x 
where  x.rank_Order = round(x.total_rows / 2.0, 0) 

Upvotes: 1

Victor Vulovic
Victor Vulovic

Reputation: 621

I have not compared the performance of this solution to the rest of the answers posted here, but I found this to be the most straight-forward to understand, and covers the full extent of the mathematical formula for calculating a median. In other words, this solution will be robust enough for even- and odd-numbered data sets:

SELECT CASE 
-- odd-numbered data sets:
WHEN MOD(COUNT(*), 2) = 1 THEN (SELECT median.<value> AS median
FROM
(SELECT t1.<value>
  FROM (SELECT <value>, 
               ROW_NUMBER() OVER(ORDER BY <value>) AS rownum
          FROM <data>) t1,
       (SELECT COUNT(*) AS num_records FROM <data>) t2
 WHERE t1.rownum =(t2.num_records) / 2) as median)
-- even-numbered data sets:
ELSE (select (low_bound.<value> + up_bound.<value>) / 2 AS median
FROM
(SELECT t1.<value>
  FROM (SELECT <value>, 
               ROW_NUMBER() OVER(ORDER BY <value>) AS rownum
          FROM <data>) t1,
       (SELECT COUNT(*) AS num_records FROM <data>) t2
 WHERE t1.rownum =(t2.num_records - 1) / 2) as low_bound,
 (SELECT t1.<value>
  FROM (SELECT <value>, 
               ROW_NUMBER() OVER(ORDER BY <value>) AS rownum
          FROM station) t1,
       (SELECT COUNT(*) AS num_records FROM data) t2
 WHERE t1.rownum =(t2.num_records + 1) / 2) as up_bound)
END
FROM <data>

Upvotes: 0

Abhishek Sengupta
Abhishek Sengupta

Reputation: 3291

Try something like :

SELECT  
CAST (AVG(val) AS DECIMAL(10,4))
FROM
(
    SELECT 
    val,
    ROW_NUMBER() OVER( ORDER BY val ) -1 AS rn,
    COUNT(1) OVER () -1 AS cnt
    FROM STATION
) as tmp
WHERE rn IN (FLOOR(cnt/2),CEILING (cnt/2))

**

Note : The reason for -1 is to make it zero indexed..i.e row number now starts from 0 instead of 1

**

Upvotes: 0

Arushi
Arushi

Reputation: 36

The below query will work perfect for both even or odd number of rows. In the subquery, we are finding the value(s) which has same number of rows before and after it. In case of odd rows the having clause will evaluate to 0 (same number of rows before and after cancels out the sign).

Similarly, for even rows the having clause evaluates to 1 for two rows (the center 2 rows) because they will (collectively) have same number of rows before and after.

In the outer query, we will avg out either the single value (in case of odd rows) or (2 values in case of even rows).

select avg(val) as median
from
(
    select d1.val
    from data d1 cross join data d2
    group by d1.val
    having abs(sum(sign(d1.val-d2.val))) in (0,1)
) sub

Note: In case your table has duplicate values, the above having clause should be changed to the below condition. In this case, there could be values outside of the original possibilities of 0,1. The below condition will make this condition dynamic and work in case of duplicates too.

having sum(case when d1.val=d2.val then 1 else 0 end)>=
abs(sum(sign(d1.val-d2.val)))

Upvotes: 0

Spandyie
Spandyie

Reputation: 945

A simple way to calculate Median in MySQL

set @ct := (select count(1) from station);
set @row := 0;

select avg(a.val) as median from 
(select * from  table order by val) a
where (select @row := @row + 1)
between @ct/2.0 and @ct/2.0 +1;

Upvotes: 1

Khurram Shehzad
Khurram Shehzad

Reputation: 56

Single query to archive the perfect median:

SELECT 
COUNT(*) as total_rows, 
IF(count(*)%2 = 1, CAST(SUBSTRING_INDEX(SUBSTRING_INDEX( GROUP_CONCAT(val ORDER BY val SEPARATOR ','), ',', 50/100 * COUNT(*)), ',', -1) AS DECIMAL), ROUND((CAST(SUBSTRING_INDEX(SUBSTRING_INDEX( GROUP_CONCAT(val ORDER BY val SEPARATOR ','), ',', 50/100 * COUNT(*) + 1), ',', -1) AS DECIMAL) + CAST(SUBSTRING_INDEX(SUBSTRING_INDEX( GROUP_CONCAT(val ORDER BY val SEPARATOR ','), ',', 50/100 * COUNT(*)), ',', -1) AS DECIMAL)) / 2)) as median, 
AVG(val) as average 
FROM 
data

Upvotes: 3

rhanqtl
rhanqtl

Reputation: 113

MySQL has supported window functions since version 8.0, you can use ROW_NUMBER or DENSE_RANK (DO NOT use RANK as it assigns the same rank to same values, like in sports ranking):

SELECT AVG(t1.val) AS median_val
  FROM (SELECT val, 
               ROW_NUMBER() OVER(ORDER BY val) AS rownum
          FROM data) t1,
       (SELECT COUNT(*) AS num_records FROM data) t2
 WHERE t1.row_num IN
       (FLOOR((t2.num_records + 1) / 2), 
        FLOOR((t2.num_records + 2) / 2));

Upvotes: 1

bob
bob

Reputation: 605

Unfortunately, neither TheJacobTaylor's nor velcrow's answers return accurate results for current versions of MySQL.

Velcro's answer from above is close, but it does not calculate correctly for result sets with an even number of rows. Medians are defined as either 1) the middle number on odd numbered sets, or 2) the average of the two middle numbers on even number sets.

So, here's velcro's solution patched to handle both odd and even number sets:

SELECT AVG(middle_values) AS 'median' FROM (
  SELECT t1.median_column AS 'middle_values' FROM
    (
      SELECT @row:=@row+1 as `row`, x.median_column
      FROM median_table AS x, (SELECT @row:=0) AS r
      WHERE 1
      -- put some where clause here
      ORDER BY x.median_column
    ) AS t1,
    (
      SELECT COUNT(*) as 'count'
      FROM median_table x
      WHERE 1
      -- put same where clause here
    ) AS t2
    -- the following condition will return 1 record for odd number sets, or 2 records for even number sets.
    WHERE t1.row >= t2.count/2 and t1.row <= ((t2.count/2) +1)) AS t3;

To use this, follow these 3 easy steps:

  1. Replace "median_table" (2 occurrences) in the above code with the name of your table
  2. Replace "median_column" (3 occurrences) with the column name you'd like to find a median for
  3. If you have a WHERE condition, replace "WHERE 1" (2 occurrences) with your where condition

Upvotes: 34

Leonardo Nicolas
Leonardo Nicolas

Reputation: 352

Install and use this mysql statistical functions: http://www.xarg.org/2012/07/statistical-functions-in-mysql/

After that, calculate median is easy:

SELECT median(val) FROM data;

Upvotes: 15

Kwex
Kwex

Reputation: 4020

I found this answer very helpful - https://www.eversql.com/how-to-calculate-median-value-in-mysql-using-a-simple-sql-query/

SET @rowindex := -1;

SELECT
   AVG(g.grade)
FROM
   (SELECT @rowindex:=@rowindex + 1 AS rowindex,
       grades.grade AS grade
    FROM grades
    ORDER BY grades.grade) AS g
WHERE
g.rowindex IN (FLOOR(@rowindex / 2) , CEIL(@rowindex / 2));

Upvotes: 0

velcrow
velcrow

Reputation: 6516

In MariaDB / MySQL:

SELECT AVG(dd.val) as median_val
FROM (
SELECT d.val, @rownum:=@rownum+1 as `row_number`, @total_rows:=@rownum
  FROM data d, (SELECT @rownum:=0) r
  WHERE d.val is NOT NULL
  -- put some where clause here
  ORDER BY d.val
) as dd
WHERE dd.row_number IN ( FLOOR((@total_rows+1)/2), FLOOR((@total_rows+2)/2) );

Steve Cohen points out, that after the first pass, @rownum will contain the total number of rows. This can be used to determine the median, so no second pass or join is needed.

Also AVG(dd.val) and dd.row_number IN(...) is used to correctly produce a median when there are an even number of records. Reasoning:

SELECT FLOOR((3+1)/2),FLOOR((3+2)/2); -- when total_rows is 3, avg rows 2 and 2
SELECT FLOOR((4+1)/2),FLOOR((4+2)/2); -- when total_rows is 4, avg rows 2 and 3

Finally, MariaDB 10.3.3+ contains a MEDIAN function

Upvotes: 272

Veeramani Natarajan
Veeramani Natarajan

Reputation: 342

The following SQL Code will help you to calculate the median in MySQL using user defined variables.

create table employees(salary int);

insert into employees values(8);
insert into employees values(23);
insert into employees values(45);
insert into employees values(123);
insert into employees values(93);
insert into employees values(2342);
insert into employees values(2238);

select * from employees;

Select salary from employees  order by salary;

set @rowid=0;
set @cnt=(select count(*) from employees);
set @middle_no=ceil(@cnt/2);
set @odd_even=null;

select AVG(salary) from 
(select salary,@rowid:=@rowid+1 as rid, (CASE WHEN(mod(@cnt,2)=0) THEN @odd_even:=1 ELSE @odd_even:=0 END) as odd_even_status  from employees  order by salary) as tbl where tbl.rid=@middle_no or tbl.rid=(@middle_no+@odd_even);

If you are looking for detailed explanation, please refer this blog.

Upvotes: 0

Prashant Srivastav
Prashant Srivastav

Reputation: 107

I have this below code which I found on HackerRank and it is pretty simple and works in each and every case.

SELECT M.MEDIAN_COL FROM MEDIAN_TABLE M WHERE  
  (SELECT COUNT(MEDIAN_COL) FROM MEDIAN_TABLE WHERE MEDIAN_COL < M.MEDIAN_COL ) = 
  (SELECT COUNT(MEDIAN_COL) FROM MEDIAN_TABLE WHERE MEDIAN_COL > M.MEDIAN_COL );

Upvotes: 7

Danylo Zherebetskyy
Danylo Zherebetskyy

Reputation: 1517

Often, we may need to calculate Median not just for the whole table, but for aggregates with respect to our ID. In other words, calculate median for each ID in our table, where each ID has many records. (good performance and works in many SQL + fixes problem of even and odds, more about performance of different Median-methods https://sqlperformance.com/2012/08/t-sql-queries/median )

SELECT our_id, AVG(1.0 * our_val) as Median
FROM
( SELECT our_id, our_val, 
  COUNT(*) OVER (PARTITION BY our_id) AS cnt,
  ROW_NUMBER() OVER (PARTITION BY our_id ORDER BY our_val) AS rn
  FROM our_table
) AS x
WHERE rn IN ((cnt + 1)/2, (cnt + 2)/2) GROUP BY our_id;

Hope it helps

Upvotes: 2

Dwipam Katariya
Dwipam Katariya

Reputation: 144

create table med(id integer);
insert into med(id) values(1);
insert into med(id) values(2);
insert into med(id) values(3);
insert into med(id) values(4);
insert into med(id) values(5);
insert into med(id) values(6);

select (MIN(count)+MAX(count))/2 from 
(select case when (select count(*) from 
med A where A.id<B.id)=(select count(*)/2 from med) OR 
(select count(*) from med A where A.id>B.id)=(select count(*)/2 
from med) then cast(B.id as float)end as count from med B) C;

 ?column? 
----------
  3.5
(1 row)

OR

select cast(avg(id) as float) from 
(select t1.id from med t1 JOIN med t2 on t1.id!= t2.id 
group by t1.id having ABS(SUM(SIGN(t1.id-t2.id)))=1) A;

Upvotes: 1

RobbertNix
RobbertNix

Reputation: 1

These methods select from the same table twice. If the source data are coming from an expensive query, this is a way to avoid running it twice:

select KEY_FIELD, AVG(VALUE_FIELD) MEDIAN_VALUE
from (
    select KEY_FIELD, VALUE_FIELD, RANKF
    , @rownumr := IF(@prevrowidr=KEY_FIELD,@rownumr+1,1) RANKR
    , @prevrowidr := KEY_FIELD
    FROM (
        SELECT KEY_FIELD, VALUE_FIELD, RANKF
        FROM (
            SELECT KEY_FIELD, VALUE_FIELD 
            , @rownumf := IF(@prevrowidf=KEY_FIELD,@rownumf+1,1) RANKF
            , @prevrowidf := KEY_FIELD     
            FROM (
                SELECT KEY_FIELD, VALUE_FIELD 
                FROM (
                    -- some expensive query
                )   B
                ORDER BY  KEY_FIELD, VALUE_FIELD
            ) C
            , (SELECT @rownumf := 1) t_rownum
            , (SELECT @prevrowidf := '*') t_previd
        ) D
        ORDER BY  KEY_FIELD, RANKF DESC
    ) E
    , (SELECT @rownumr := 1) t_rownum
    , (SELECT @prevrowidr := '*') t_previd
) F
WHERE RANKF-RANKR BETWEEN -1 and 1
GROUP BY KEY_FIELD

Upvotes: 0

Ariel Allon
Ariel Allon

Reputation: 852

Based on @bob's answer, this generalizes the query to have the ability to return multiple medians, grouped by some criteria.

Think, e.g., median sale price for used cars in a car lot, grouped by year-month.

SELECT 
    period, 
    AVG(middle_values) AS 'median' 
FROM (
    SELECT t1.sale_price AS 'middle_values', t1.row_num, t1.period, t2.count
    FROM (
        SELECT 
            @last_period:=@period AS 'last_period',
            @period:=DATE_FORMAT(sale_date, '%Y-%m') AS 'period',
            IF (@period<>@last_period, @row:=1, @row:=@row+1) as `row_num`, 
            x.sale_price
          FROM listings AS x, (SELECT @row:=0) AS r
          WHERE 1
            -- where criteria goes here
          ORDER BY DATE_FORMAT(sale_date, '%Y%m'), x.sale_price
        ) AS t1
    LEFT JOIN (  
          SELECT COUNT(*) as 'count', DATE_FORMAT(sale_date, '%Y-%m') AS 'period'
          FROM listings x
          WHERE 1
            -- same where criteria goes here
          GROUP BY DATE_FORMAT(sale_date, '%Y%m')
        ) AS t2
        ON t1.period = t2.period
    ) AS t3
WHERE 
    row_num >= (count/2) 
    AND row_num <= ((count/2) + 1)
GROUP BY t3.period
ORDER BY t3.period;

Upvotes: 1

Nochum Sossonko
Nochum Sossonko

Reputation: 21

SELECT 
    SUBSTRING_INDEX(
        SUBSTRING_INDEX(
            GROUP_CONCAT(field ORDER BY field),
            ',',
            ((
                ROUND(
                    LENGTH(GROUP_CONCAT(field)) - 
                    LENGTH(
                        REPLACE(
                            GROUP_CONCAT(field),
                            ',',
                            ''
                        )
                    )
                ) / 2) + 1
            )),
            ',',
            -1
        )
FROM
    table

The above seems to work for me.

Upvotes: 2

yuhanluo
yuhanluo

Reputation: 19

This way seems include both even and odd count without subquery.

SELECT AVG(t1.x)
FROM table t1, table t2
GROUP BY t1.x
HAVING SUM(SIGN(t1.x - t2.x)) = 0

Upvotes: 0

ADC
ADC

Reputation: 1

set @r = 0;

select  
    case when mod(c,2)=0 then round(sum(lat_N),4)
    else round(sum(lat_N)/2,4) 
    end as Med  
from 
    (select lat_N, @r := @r+1, @r as id from station order by lat_N) A
    cross join
    (select (count(1)+1)/2 as c from station) B
where id >= floor(c) and id <=ceil(c)

Upvotes: -1

Related Questions