Erin L
Erin L

Reputation: 95

SQL: Select the largest value for a group

I have a table like this:

someOtherCols shelf type count
... row1 A 2
... row1 B 3
... row2 C 2
... row2 D 2

I would like to group by shelf, and only keep the type that has the highest count. If there's a tie, choose any row.

So the result would be like:

someOtherCols shelf type count
... row1 B 3
... row2 C 2

I'm using AWS Athena and I'm trying the following query that I saw from another answer I saw:

SELECT * FROM table
WHERE count IN (MAX(count) FROM shoppingAggregate GROUP BY someOtherCols, shelf)

Seems like Athena does not like it. How can I achieve this? Thanks a lot!

Upvotes: 0

Views: 747

Answers (3)

dmg
dmg

Reputation: 4481

Use window functions:

you will have to do an over for each attribute in "some other cols". I used min below (you have to make the query deterministic). You can choose what you prefer.

select distinct -- you need to remove duplicates
           min(othercol) over (partition by row), 
           row, max(type) over (partition by row), 
           count(*) over (partition by row)
   from relation;

sqlite will allow you to have a non-determinist query that will be significantly simpler using a group by (but it will not work in other dbms)

Upvotes: 1

Gordon Linoff
Gordon Linoff

Reputation: 1270703

Presto offers the function max_by() which I think is in Athena as well:

select shelf, max(count), max_by(type, count)
from t
group by shelf

Upvotes: 0

Erin L
Erin L

Reputation: 95

I tried to use ROW_NUMBER() and was able to solve this problem:

ranked AS (
  SELECT table.*, ROW_NUMBER() OVER (PARTITION BY othercols, shelf ORDER BY counts DESC) AS rn
  FROM table
)
SELECT * FROM ranked WHERE rn = 1;

Upvotes: 0

Related Questions