Reputation: 1458

how do retrieve specific row in Hive?

I have a dataset looks like this:

---------------------------
 cust | cost | cat   | name
---------------------------
  1   |  2.5 | apple | pkLady
---------------------------
  1   |  3.5 | apple | greenGr
---------------------------
  1   |  1.2 | pear  | yelloPear
----------------------------
  1   |  4.5 | pear  |  greenPear
-------------------------------

my hive query should now compare the cheapest price of each item the customer bought. So I want now to get the 2.5 and 1.2 into one row to get its difference. Since I am new to Hive I don't now how to ignore everything else until I reach next category of item while I still kept the cheapest price in the previous category.

Upvotes: 0

Answers (4)

o-90

Reputation: 17613

Given your options (brickhouse UDFs, hive windowing functions or a self-join) in Hive, a self-join is the worst way to do this.

select *
  , (cost - min(cost) over (partition by cust)) cost_diff
from table

Upvotes: 2

Aman

Reputation: 3261

you can use like below:

select cat,min(cost) from table group by cost;

Upvotes: 2

maxymoo

Reputation: 36555

I think this is really a SQL question rather than a Hive question: If you just want the cheapest cost per customer you can do

select cust, min(cost)
group by cust

Otherwise if you want the cheapest cost per customer per category you can do:

select cust, cat, min(cost)
from yourtable
groupby cust, cat

Upvotes: 1

Alex Woolford

Reputation: 4563

You could create a subquery containing the minimum cost for each customer, and then join it to the original table:

select
    mytable.*,
    minCost.minCost,
    cost - minCost as costDifference
from mytable
inner join
    (select
       cust,
       min(cost) as minCost
    from mytable
    group by cust) minCost
on mytable.cust = minCost.cust

I created an interactive SQLFiddle example using MySQL, but it should work just fine in Hive.

Upvotes: 1

how do retrieve specific row in Hive?

Answers (4)

Related Questions