Reputation: 1458
I have a dataset looks like this:
---------------------------
cust | cost | cat | name
---------------------------
1 | 2.5 | apple | pkLady
---------------------------
1 | 3.5 | apple | greenGr
---------------------------
1 | 1.2 | pear | yelloPear
----------------------------
1 | 4.5 | pear | greenPear
-------------------------------
my hive query should now compare the cheapest price of each item the customer bought. So I want now to get the 2.5 and 1.2 into one row to get its difference. Since I am new to Hive I don't now how to ignore everything else until I reach next category of item while I still kept the cheapest price in the previous category.
Upvotes: 0
Views: 1197
Reputation: 17613
Given your options (brickhouse UDFs, hive windowing functions or a self-join) in Hive, a self-join is the worst way to do this.
select *
, (cost - min(cost) over (partition by cust)) cost_diff
from table
Upvotes: 2
Reputation: 3261
you can use like below:
select cat,min(cost) from table group by cost;
Upvotes: 2
Reputation: 36555
I think this is really a SQL question rather than a Hive question: If you just want the cheapest cost per customer you can do
select cust, min(cost)
group by cust
Otherwise if you want the cheapest cost per customer per category you can do:
select cust, cat, min(cost)
from yourtable
groupby cust, cat
Upvotes: 1
Reputation: 4563
You could create a subquery containing the minimum cost for each customer, and then join it to the original table:
select
mytable.*,
minCost.minCost,
cost - minCost as costDifference
from mytable
inner join
(select
cust,
min(cost) as minCost
from mytable
group by cust) minCost
on mytable.cust = minCost.cust
I created an interactive SQLFiddle example using MySQL, but it should work just fine in Hive.
Upvotes: 1