Holmes
Holmes

Reputation: 1079

Select top 2 rows in Hive

I'm trying to retrieve top 2 tables from my employee list based on salary in hive (version 0.11). Since it doesn't support TOP function, is there any alternatives? Or do we have define a UDF?

Upvotes: 34

Views: 155211

Answers (3)

INDRAJITH EKANAYAKE
INDRAJITH EKANAYAKE

Reputation: 4274

Here I think it's worth mentioning SORT BY and ORDER BY both clauses and why they different,

SELECT * FROM <table_name> SORT BY <column_name> DESC LIMIT 2

If you are using SORT BY clause it sort data per reducer which means if you have more than one MapReduce task it will result partially ordered data. On the other hand, the ORDER BY clause will result in ordered data for the final Reduce task. To understand more please refer to this link.

SELECT * FROM <table_name> ORDER BY <column_name> DESC LIMIT 2

Note: Finally, Even though the accepted answer contains SORT BY clause, I mostly prefer to use ORDER BY clause for the general use case to avoid any data loss.

Upvotes: 5

Ashish Doneriya
Ashish Doneriya

Reputation: 1863

select * from employee_list order by salary desc limit 2;

Upvotes: 3

Akshay Shrivastava
Akshay Shrivastava

Reputation: 774

Yes, here you can use LIMIT.

You can try it by the below query:

SELECT * FROM employee_list SORT BY salary DESC LIMIT 2

Upvotes: 75

Related Questions