Reputation: 323
I was able to write the query to find the median using the following logic below, where I am having trouble is trying to understand the logic. Can someone please help me understand what is going on.I got the code from an advance sql book.
Specifically will this code work for both odd and even numbers. I tried the code and it works ,but I am very curious to understand the logic.
select avg(sales) as median
from
(select g1.sales
from ga g1, ga g2
group by g1.sales
having sum(case when g1.sales = g2.sales then 1 ELSE 0 END) >= ABS(SUM(SIGN(g1.sales-g2.sales))))g3;
Upvotes: 2
Views: 89
Reputation: 9129
This groups a Cartesian product by sales number to find the "middle" 1 or 2 sales and then averages the result to given the median. See detailed comments in-line.
--the subquery will return the 1 or 2 middle values and the average of those is the median
select avg(sales * 1.0) as median
from (
select g1.sales
--this creates a cartesian product of ga with itself
from ga g1, ga g2
--and then group by sales, which allows comparision of each given sales figure all others
group by g1.sales
having
--the sum(case) here acts a count of row in the cartesian product that have matching sales values
--this will be the the square of the count() from ga where for each given sales number
sum(
case
when g1.sales = g2.sales
then 1
ELSE 0
END)
>= --The comparison acts as a de-weighting mechanism to handle duplicate sales numbers
--Such that if I have the same sales figure twice I'll have 4 rows in the Cartesian product
--and I should see a matching 4 or 0 if the row can be used in the final median calculation
--the abs(sum(sign())) here acts as a measure of how far of the median each sales is
--by looking at how many sales are greater then, equal, a lesser. The sales at or nearest
--the median will have the lowest numbers here.
ABS(
SUM(
SIGN(g1.sales-g2.sales)
)
)
)g3;
Upvotes: 1