Reputation: 87
I have some data like:-
ID PRICE
1 100
2 200
3 120
4 130
5 320
6 300
7 200
8 100
9 120
10 250
I need to find top 20% price.
Expected output:-
ID PRICE
5 320
6 300
Upvotes: 0
Views: 2712
Reputation: 1
Heres a way in which you're able to do it without having to use join
.
Select id,price from (select id,price, row_number() over(order by price desc) r1,count(*) over()*(20/100) ct from table_name)final where r1<=ct ;
Upvotes: 0
Reputation: 38290
You can do it without joins. Use analytic function to calculate max(price)
, take 80%, then use filter price>80%:
with your_data as ( --this is your data
select stack(10,
1 , 100,
2 , 200,
3 , 120,
4 , 130,
5 , 320,
6 , 300,
7 , 200,
8 , 100,
9 , 120,
10, 250) as (ID, PRICE)
)
select id, price
from
(
select d.*, max(price) over()*0.8 as pct_80 from your_data d
)s where price>pct_80
Result:
OK
id price
6 300
5 320
Use your table instead of WITH
subquery, add order by ID if necessary.
Upvotes: 1
Reputation: 1197
Below would be the queries -
with top_20 as (
select
max(price)*0.8 as price1
from
<tableName>
)
select * from <tableName> t1 , top_20 t2 where t1.price > t2.price1;
select
name,
price
from
(select
name,
price,
max(price)*0.8 over (order by price) as top_20
from <tableName>
) t1
where
t1.price > t1.top_20;
Below query will not work in hive -
select * from <tableName> where price > (select max(salary)*0.8 from <tableName>)
select * from <tableName> t1 where exists (select salary from <tablename> t2 where t1.salary > t2.salary*0.8)
Reason - Hive does not support sub query in where clause with equal condition, it supports with only IN, NOT IN , EXISTS and NOT EXISTS.
Even with Exists and NOT Exists, it supports only Equijoin, Refer https://cwiki.apache.org/confluence/display/Hive/LanguageManual+SubQueries#LanguageManualSubQueries-SubqueriesintheWHEREClause for more details
Hope this helps.
Upvotes: 0