MBZ
MBZ

Reputation: 27592

rank() over in hive

I'm converting a SQL Server stored procedure to HiveQL.

How can I convert something like:

SELECT 
    p.FirstName, p.LastName,
    RANK() OVER (ORDER BY a.PostalCode) AS Rank

Upvotes: 2

Views: 10141

Answers (3)

Sanjay Subramanian
Sanjay Subramanian

Reputation: 1529

No big deal Somehow in Hive 0.13.1 that comma does not work :-(

so I got it working without the comma

(PARTITION BY category country ORDER BY sales DESC)

Upvotes: 0

iggy
iggy

Reputation: 722

For anyone who comes across this question, Hive now supports rank() and other analytics functions. https://cwiki.apache.org/confluence/display/Hive/LanguageManual+WindowingAndAnalytics

Upvotes: 2

Charles Menguy
Charles Menguy

Reputation: 41428

I Have seen this use case a few times, there is a way to do something similar to RANK() in Hive using a UDF.

There are basically a few steps:

  • Partition the data into groups with DISTRIBUTE BY
  • Order the data in each group with SORT BY

There is actually a nice article on the topic, and you can also find some code from Edward Capriolo here.

Here is an example query doing a rank in Hive:

ADD JAR p-rank-demo.jar;
CREATE TEMPORARY FUNCTION p_rank AS 'demo.PsuedoRank';

SELECT
 category,country,product,sales,rank
FROM (
 SELECT
   category,country,product,sales,
   p_rank(category, country) rank
 FROM (
   SELECT
     category,country,product,
     sales
   FROM p_rank_demo
   DISTRIBUTE BY
     category,country
   SORT BY
     category,country,sales desc) t1) t2
WHERE rank <= 3

Which does the equivalent of the following query in MySQL:

SELECT
 category,country,product,sales,rank
FROM (
 SELECT
   category,country,product, sales,
   rank() over (PARTITION BY category, country ORDER BY sales DESC) rank
 FROM p_rank_demo) t
WHERE rank <= 3

Upvotes: 7

Related Questions