Reputation: 95
I have a table like this:
someOtherCols | shelf | type | count |
---|---|---|---|
... | row1 | A | 2 |
... | row1 | B | 3 |
... | row2 | C | 2 |
... | row2 | D | 2 |
I would like to group by shelf, and only keep the type that has the highest count. If there's a tie, choose any row.
So the result would be like:
someOtherCols | shelf | type | count |
---|---|---|---|
... | row1 | B | 3 |
... | row2 | C | 2 |
I'm using AWS Athena and I'm trying the following query that I saw from another answer I saw:
SELECT * FROM table
WHERE count IN (MAX(count) FROM shoppingAggregate GROUP BY someOtherCols, shelf)
Seems like Athena does not like it. How can I achieve this? Thanks a lot!
Upvotes: 0
Views: 747
Reputation: 4481
Use window functions:
you will have to do an over for each attribute in "some other cols". I used min below (you have to make the query deterministic). You can choose what you prefer.
select distinct -- you need to remove duplicates
min(othercol) over (partition by row),
row, max(type) over (partition by row),
count(*) over (partition by row)
from relation;
sqlite will allow you to have a non-determinist query that will be significantly simpler using a group by (but it will not work in other dbms)
Upvotes: 1
Reputation: 1270703
Presto offers the function max_by()
which I think is in Athena as well:
select shelf, max(count), max_by(type, count)
from t
group by shelf
Upvotes: 0
Reputation: 95
I tried to use ROW_NUMBER() and was able to solve this problem:
ranked AS (
SELECT table.*, ROW_NUMBER() OVER (PARTITION BY othercols, shelf ORDER BY counts DESC) AS rn
FROM table
)
SELECT * FROM ranked WHERE rn = 1;
Upvotes: 0