user6269144
user6269144

Reputation: 318

SQL - Calculating median without sorting

I have an SQL table with one column of integer values and I would like to find the median of those values without any sorting. I first tried finding the numbers in the table that are greater than and less than each value:

SELECT DISTINCT d1.v,       
  (SELECT COUNT(d2.v)
   FROM Values d2  
   WHERE d2.v > d1.v
   ) AS greater_than,
   (SELECT COUNT(d2.v)
   FROM Values d2  
   WHERE d2.v < d1.v
   ) AS less_than
FROM Values d1

enter image description here

I'm not sure how to proceed. I believe I want the values in the above table where greater_than and less_than are both equal to num_entries / 2, but that would only work for a table with an even number of entries. What's a good way to get the median from what I have above?

Upvotes: 3

Views: 1049

Answers (2)

Dwipam Katariya
Dwipam Katariya

Reputation: 144

Look for total numbers for greater than and less than self number that is equal to the total number of rows/2.

create table med(id integer);
insert into med(id) values(1);
insert into med(id) values(2);
insert into med(id) values(3);
insert into med(id) values(4);
insert into med(id) values(5);
insert into med(id) values(6);

select (MIN(count)+MAX(count))/2 from 
(select case when (select count(*) from 
med A where A.id<B.id)=(select count(*)/2 from med) OR 
(select count(*) from med A where A.id>B.id)=(select count(*)/2 
from med) then cast(B.id as float)end as count from med B) C;

 ?column? 
----------
  3.5
(1 row)

You need cast to convert to float, otherwise you would get 3 here.

Upvotes: 0

trincot
trincot

Reputation: 349989

You could do it like this:

SELECT (MIN(v)+MAX(v))/2
FROM   (
        SELECT  CASE WHEN
                        LEAST((SELECT COUNT(*) FROM tbl WHERE v <= d1.v),
                              (SELECT COUNT(*) FROM tbl WHERE v >= d1.v))
                           >= (SELECT COUNT(*)/2 FROM tbl) 
                     THEN v
                END as v
        FROM    tbl d1
       ) as sub

The inner query is guaranteed to return 1 or 2 distinct non-null values among potentially many null values. The non-null values may repeat, but by taking the minimum and maximum those two values can be used to calculate the median.

NB: Don't name your table Values: it is a reserved word.

Upvotes: 2

Related Questions