Reputation: 761
Due Hive software constraints, I don't have the luxury of using RANK(), PARTITION BY, and OVER().
What I want to do is retrieve the first row from each group.
The table schema is as follows:
# Field, Type, Null, Key, Default, Extra
'prod_id', 'int(11)', 'NO', 'PRI', NULL, ''
'brand', 'varchar(20)', 'YES', '', NULL, ''
'name', 'varchar(75)', 'YES', '', NULL, ''
'price', 'int(11)', 'YES', '', NULL, ''
'cost', 'int(11)', 'YES', '', NULL, ''
'shipping_wt', 'smallint(6)', 'YES', '', NULL, ''
My current solution via SQL is:
select brand,name,price from
(select brand,name,price,row_number()
over(partition by brand order by price desc) as rn
from products) as sub
where rn=1;
How do I do so without the aforementioned functions and keywords?
Upvotes: 1
Views: 885
Reputation: 1271111
You can use join
and aggregation:
select p.*
from products p join
(select p.brand, min(p.price) as min_price
from products p
group by p.brand
) pp
on p.brand = pp.brand and p.price = pp.min_price;
This is actually equivalent to rank()
rather than row_number()
, but it is the same if there are no duplicate prices.
I strongly recommend using the window functions, though. Not only are they standard and an integral part of Hive, but they should be faster too.
Upvotes: 2