I have a table like this: someOtherCols shelf type count ... row1 A 2 ... row1 B 3 ... row2 C 2 ... row2 D 2 I would like to group by shelf, and only keep the type that has the highest count. If there's a tie, choose any row. So the result would be like: someOtherCols shelf type count ... row1 B 3 ... row2 C 2 I'm using AWS Athena and I'm trying the following query that I saw from another answer I saw: SELECT * FROM table WHERE count IN (MAX(count) FROM shoppingAggregate GROUP BY someOtherCols, shelf) Seems like Athena does not like it. How can I achieve this? Thanks a lot!

Reputation: 95

SQL: Select the largest value for a group

I have a table like this:

someOtherCols	shelf	type	count
...	row1	A	2
...	row1	B	3
...	row2	C	2
...	row2	D	2

I would like to group by shelf, and only keep the type that has the highest count. If there's a tie, choose any row.

So the result would be like:

someOtherCols	shelf	type	count
...	row1	B	3
...	row2	C	2

I'm using AWS Athena and I'm trying the following query that I saw from another answer I saw:

SELECT * FROM table
WHERE count IN (MAX(count) FROM shoppingAggregate GROUP BY someOtherCols, shelf)

Seems like Athena does not like it. How can I achieve this? Thanks a lot!

Upvotes: 0

Answers (3)

dmg

Reputation: 4481

Use window functions:

you will have to do an over for each attribute in "some other cols". I used min below (you have to make the query deterministic). You can choose what you prefer.

select distinct -- you need to remove duplicates
           min(othercol) over (partition by row), 
           row, max(type) over (partition by row), 
           count(*) over (partition by row)
   from relation;

sqlite will allow you to have a non-determinist query that will be significantly simpler using a group by (but it will not work in other dbms)

Upvotes: 1

Gordon Linoff

Reputation: 1270703

Presto offers the function max_by() which I think is in Athena as well:

select shelf, max(count), max_by(type, count)
from t
group by shelf

Upvotes: 0

Erin L

Reputation: 95

I tried to use ROW_NUMBER() and was able to solve this problem:

ranked AS (
  SELECT table.*, ROW_NUMBER() OVER (PARTITION BY othercols, shelf ORDER BY counts DESC) AS rn
  FROM table
)
SELECT * FROM ranked WHERE rn = 1;

Upvotes: 0

SQL: Select the largest value for a group

Answers (3)

Related Questions