Rick
Rick

Reputation: 141

Error while compiling statement: FAILED: SemanticException [Error 10002]

select d.order_type from migu_td_aaa_order_log_d d where  exists(select 1 
from migu_user r where r.user_id = '156210106' and r.user_num = 
d.serv_number) and d.product_id in ('2028594290','2028596512','2028597138' ) 
order by d.opr_time desc limit 1 

why the above sql failed ,indicates : FAILED: SemanticException [Error 10002]: Line 4:11 Invalid column reference 'opr_time'

but the below one works :

select temp.order_type from (
select d.* from migu_td_aaa_order_log_d d where  exists(select 1 from 
migu_user r where r.user_id = '156210106' and r.user_num = d.serv_number) 
and d.product_id in ('2028594290','2028596512','2028597138' ) order by 
d.opr_time desc limit 1) temp;

this one works fine ,too ,and much more efficient than the second one:

select d.* from migu_td_aaa_order_log_d d where  exists(select 1 from 
migu_user r where r.user_id = '156210106' and r.user_num = d.serv_number) 
and d.product_id in ('2028594290','2028596512','2028597138' ) 
order by d.opr_time desc limit 1

I only need to get order_type field,so even though the second one works,but it cost much more time. Can anyone help me? Thanks a lot!

Upvotes: 2

Views: 7539

Answers (2)

David דודו Markovitz
David דודו Markovitz

Reputation: 44961

1.

Hive currently have an order by limitation.
The current status of this issue is PATCH AVAILABLE.

see -
"Can't order by an unselected column"
https://issues.apache.org/jira/browse/HIVE-15160

2.

You might want to get familiar with LEFT SEMI JOIN which is a cleaner syntax for EXISTS https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Joins#LanguageManualJoins-JoinSyntax

3.

using min / max over a struct / named_struct can be used instead of order by ... asc / desc and limit 1

Here is an alternative solution:

select      max(named_struct('opr_time',d.opr_time,'order_type',d.order_type)).order_type

from                        migu_td_aaa_order_log_d d 
            
            left semi join  migu_user               r 
            
            on              r.user_num  =  
                            d.serv_number 
                        
                        and r.user_id   = '156210106' 

where       d.product_id in ('2028594290','2028596512','2028597138') 
;

P.s.

You seriously want to consider to treat IDs (user_id, product_id) as numeric and not as strings.

Upvotes: 0

Jaime Caffarel
Jaime Caffarel

Reputation: 2469

Your first query does not work because, in the first select statement, you are just getting one column (d.order_type), but you are trying to order by another column (d.opr_time), which you have not included in your select statement

select d.order_type from ...
...
order by d.opr_time desc limit 1

Note that if you added the column d.opr_time to your first query, it would work:

select d.order_type, d.opr_time from ...
...
order by d.opr_time desc limit 1

Your second query works because, in the subquery, you have selected all the columns of d (d.*), so when you order by opr_time, that column is present. (Same for the third query).

select temp.order_type from (
select d.* ... order by d.opr_time ...

EDITED:

According to the Hive documentation:

When using group by clause, the select statement can only include columns included in the group by clause. Of course, you can have as many aggregation functions (e.g. count) in the select statement as well.

So, this query:

select d.order_type, d.opr_time from ...
...
order by d.opr_time desc limit 1

Shouldn't work either, because the select clause has an additional column (d.order_type) that is not included in the group by clause.

I hope this helps.

P.S. This answer about SQL execution order might be useful.

Upvotes: 2

Related Questions