Reputation: 141
select d.order_type from migu_td_aaa_order_log_d d where exists(select 1
from migu_user r where r.user_id = '156210106' and r.user_num =
d.serv_number) and d.product_id in ('2028594290','2028596512','2028597138' )
order by d.opr_time desc limit 1
why the above sql failed ,indicates : FAILED: SemanticException [Error 10002]: Line 4:11 Invalid column reference 'opr_time'
but the below one works :
select temp.order_type from (
select d.* from migu_td_aaa_order_log_d d where exists(select 1 from
migu_user r where r.user_id = '156210106' and r.user_num = d.serv_number)
and d.product_id in ('2028594290','2028596512','2028597138' ) order by
d.opr_time desc limit 1) temp;
this one works fine ,too ,and much more efficient than the second one:
select d.* from migu_td_aaa_order_log_d d where exists(select 1 from
migu_user r where r.user_id = '156210106' and r.user_num = d.serv_number)
and d.product_id in ('2028594290','2028596512','2028597138' )
order by d.opr_time desc limit 1
I only need to get order_type field,so even though the second one works,but it cost much more time. Can anyone help me? Thanks a lot!
Upvotes: 2
Views: 7539
Reputation: 44961
Hive currently have an order by
limitation.
The current status of this issue is PATCH AVAILABLE
.
see -
"Can't order by an unselected column"
https://issues.apache.org/jira/browse/HIVE-15160
You might want to get familiar with LEFT SEMI JOIN
which is a cleaner syntax for EXISTS
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Joins#LanguageManualJoins-JoinSyntax
using min / max
over a struct / named_struct
can be used instead of order by ... asc / desc
and limit 1
Here is an alternative solution:
select max(named_struct('opr_time',d.opr_time,'order_type',d.order_type)).order_type
from migu_td_aaa_order_log_d d
left semi join migu_user r
on r.user_num =
d.serv_number
and r.user_id = '156210106'
where d.product_id in ('2028594290','2028596512','2028597138')
;
P.s.
You seriously want to consider to treat IDs (user_id
, product_id
) as numeric and not as strings.
Upvotes: 0
Reputation: 2469
Your first query does not work because, in the first select statement, you are just getting one column (d.order_type
), but you are trying to order by another column (d.opr_time
), which you have not included in your select
statement
select d.order_type from ...
...
order by d.opr_time desc limit 1
Note that if you added the column d.opr_time
to your first query, it would work:
select d.order_type, d.opr_time from ...
...
order by d.opr_time desc limit 1
Your second query works because, in the subquery, you have selected all the columns of d
(d.*
), so when you order by opr_time
, that column is present. (Same for the third query).
select temp.order_type from (
select d.* ... order by d.opr_time ...
EDITED:
According to the Hive documentation:
When using group by clause, the select statement can only include columns included in the group by clause. Of course, you can have as many aggregation functions (e.g. count) in the select statement as well.
So, this query:
select d.order_type, d.opr_time from ...
...
order by d.opr_time desc limit 1
Shouldn't work either, because the select clause has an additional column (d.order_type
) that is not included in the group by clause.
I hope this helps.
P.S. This answer about SQL execution order might be useful.
Upvotes: 2