Reputation: 72
I have the following data in Hive (sorted by Count column):
ItemX ItemXX Count
----- ------ -----
item1 item11 8
item1 item12 2
item1 item13 1
item2 item21 10
item2 item22 1
item3 item31 9
item3 item32 7
item3 item33 5
I need to filter max Count lines only. Like this:
item1 item11 8
item2 item21 10
item3 item31 9
How can I do this in Hive QL?
Upvotes: 1
Views: 60
Reputation: 4375
In common cases (without sorting) you can use these queries:
-- 1.
SELECT a.* FROM tbl a
INNER JOIN (
SELECT ItemX, MAX(Count) Count
FROM tbl
GROUP BY ItemX
) b ON a.ItemX = b.ItemX AND a.Count = b.Count;
-- 2.
SELECT a.* FROM tbl a
LEFT OUTER JOIN tbl b
ON a.ItemX = b.ItemX AND a.Count < b.Count
WHERE b.ItemX IS NULL;
Note: Second query can be execute with Hive 2.2.0+ because complex expressions in ON
clause are not supported before this version (see Hive wiki).
Upvotes: 1
Reputation: 38290
Use row_number()
function:
select ItemX, ItemXX, Count
from
(
select ItemX, ItemXX, Count,
row_number() over(partition by ItemX order by Count desc) rn
from table_name
)s
where rn=1
Upvotes: 3