Reputation: 1079
I'm trying to retrieve top 2 tables from my employee list based on salary in hive (version 0.11). Since it doesn't support TOP function, is there any alternatives? Or do we have define a UDF?
Upvotes: 34
Views: 155211
Reputation: 4274
Here I think it's worth mentioning SORT BY
and ORDER BY
both clauses and why they different,
SELECT * FROM <table_name> SORT BY <column_name> DESC LIMIT 2
If you are using SORT BY
clause it sort data per reducer which means if you have more than one MapReduce task it will result partially ordered data. On the other hand, the ORDER BY
clause will result in ordered data for the final Reduce task. To understand more please refer to this link.
SELECT * FROM <table_name> ORDER BY <column_name> DESC LIMIT 2
Note: Finally, Even though the accepted answer contains SORT BY
clause, I mostly prefer to use ORDER BY
clause for the general use case to avoid any data loss.
Upvotes: 5
Reputation: 774
Yes, here you can use LIMIT
.
You can try it by the below query:
SELECT * FROM employee_list SORT BY salary DESC LIMIT 2
Upvotes: 75