SheerSt
SheerSt

Reputation: 3337

Group by and exclude minimum and maximum from results

I have data similar to the following:

Date        ID          Amount
10-Jun-14   978500302   163005350
17-Jun-14   978500302   159947117
24-Jun-14   978500302   159142342
1-Jul-14    978500302   159623201
8-Jul-14    978500302   143066033
14-Jul-14   978500302   145852027
15-Jul-14   978500302   148595751

Is there a way in oracle that I can get an average of this data, which excludes the highest and lowest value? I can get the overall average by doing a GROUP BY ID, and then AVG(Amount). But how can I do this while excluding min and max?

Upvotes: 0

Views: 15399

Answers (4)

Alex Poole
Alex Poole

Reputation: 191235

Based on the comment saying "Exactly one min and one max need to be dropped, no matter what", a variation of Gordon's approach would always exclude one row only if the min or max amount was duplicated:

select id, avg(amount)
from (
  select id, amount,
    row_number() over (partition by id order by amount) as min_rn,
    row_number() over (partition by id order by amount desc) as max_rn
  from t42
)
where min_rn > 1
and max_rn > 1
group by id;

This would behave the same as Gordon's if you used rank() or dense_rank() rather than row_number() as they allow duplicate results. Any of these will only hit the real table once.

To see how the ranking works, with your min and max amounts repeated for effect:

select dt, id, amount,
  row_number() over (partition by id order by amount) as min_rn,
  row_number() over (partition by id order by amount desc) as max_rn,
  rank() over (partition by id order by amount) as min_rnk,
  rank() over (partition by id order by amount desc) as max_rnk
from t42;

DT                ID     AMOUNT     MIN_RN     MAX_RN    MIN_RNK    MAX_RNK
--------- ---------- ---------- ---------- ---------- ---------- ----------
23-JUL-14  978500302  143066033          1          8          1          8 
08-JUL-14  978500302  143066033          2          9          1          8 
14-JUL-14  978500302  145852027          3          7          3          7 
15-JUL-14  978500302  148595751          4          6          4          6 
24-JUN-14  978500302  159142342          5          5          5          5 
01-JUL-14  978500302  159623201          6          4          6          4 
17-JUN-14  978500302  159947117          7          3          7          3 
24-JUL-14  978500302  163005350          8          2          8          1 
10-JUN-14  978500302  163005350          9          1          8          1 

With the same data, using `row_number() gives:

        ID AVG(AMOUNT)
---------- -----------
 978500302   154175974 

Using rank() gives:

        ID AVG(AMOUNT)
---------- -----------
 978500302   154632088 

SQL Fiddle.

You will get no result for an ID with only 1 or 2 amounts, which may be what you want to happen, and might make sense for consistency. If you did want to allow for that possibility though and include the min and max when those are the only values, you could also count how many raw rows there are for each ID and include the min and max if there aren't enough:

select id, avg(amount)
from (
  select id, amount,
    count(*) over (partition by id) as cnt,
    row_number() over (partition by id order by amount) as min_rn,
    row_number() over (partition by id order by amount desc) as max_rn
  from t42
)
where (cnt < 3 or min_rn > 1)
and (cnt < 3 or max_rn > 1)
group by id;

Upvotes: 0

Anup Agrawal
Anup Agrawal

Reputation: 6669

Try something like this

SELECT ID, 
(SUM(Amount)-Min(Amount)-Max(Amount))/(COUNT(Amount)-2) AS AVG
FROM yourTbl
Group By ID

As pointed out by @Clockwork-Muse this will work only when the row count is more than 2.

Upvotes: 1

Brian DeMilia
Brian DeMilia

Reputation: 13248

select id, avg(amount)
  from tbl t
 where id not in (select id from tbl group by id having count(distinct amount) > 2)
    or (amount <> (select min(x.amount) from tbl x where x.id = t.id)
   and amount <> (select max(x.amount) from tbl x where x.id = t.id))
 group by id

The first line in the WHERE clause is there to retain IDs that do not have more than 2 values. They would otherwise be excluded from the results.

If you would rather they be excluded, you can get rid of that line.

You could also try the following:

select id, avg(amount)
  from (select id, amount
          from tbl
        minus
        select id, min(amount)
          from tbl
         group by id
        minus
        select id, max(amount)
          from tbl
         group by id)
 group by id

Upvotes: 1

Gordon Linoff
Gordon Linoff

Reputation: 1269513

The easiest way is to use analytic functions to get the minimum and maximum values before aggregating:

select id, avg(amount)
from (select d.*,
             min(amount) over (partition by id) as mina,
             max(amount) over (partition by id) as maxa
      from data d
     ) d
where amount > mina and amount < maxa
group by id;

Upvotes: 6

Related Questions