Reputation: 27592
I'm converting a SQL Server stored procedure to HiveQL.
How can I convert something like:
SELECT
p.FirstName, p.LastName,
RANK() OVER (ORDER BY a.PostalCode) AS Rank
Upvotes: 2
Views: 10141
Reputation: 1529
No big deal Somehow in Hive 0.13.1 that comma does not work :-(
so I got it working without the comma
(PARTITION BY category country ORDER BY sales DESC)
Upvotes: 0
Reputation: 722
For anyone who comes across this question, Hive now supports rank() and other analytics functions. https://cwiki.apache.org/confluence/display/Hive/LanguageManual+WindowingAndAnalytics
Upvotes: 2
Reputation: 41428
I Have seen this use case a few times, there is a way to do something similar to RANK()
in Hive using a UDF.
There are basically a few steps:
DISTRIBUTE BY
SORT BY
There is actually a nice article on the topic, and you can also find some code from Edward Capriolo here.
Here is an example query doing a rank in Hive:
ADD JAR p-rank-demo.jar;
CREATE TEMPORARY FUNCTION p_rank AS 'demo.PsuedoRank';
SELECT
category,country,product,sales,rank
FROM (
SELECT
category,country,product,sales,
p_rank(category, country) rank
FROM (
SELECT
category,country,product,
sales
FROM p_rank_demo
DISTRIBUTE BY
category,country
SORT BY
category,country,sales desc) t1) t2
WHERE rank <= 3
Which does the equivalent of the following query in MySQL:
SELECT
category,country,product,sales,rank
FROM (
SELECT
category,country,product, sales,
rank() over (PARTITION BY category, country ORDER BY sales DESC) rank
FROM p_rank_demo) t
WHERE rank <= 3
Upvotes: 7