Surender Raja
Surender Raja

Reputation: 3609

Hive Join Query throws Error

If a user_id has multiple records then keep the latest record as per event_timestamp .

My HIVE query :

SELECT 
 a.user_id,
 unix_timestamp(event_timestamp,'dd/MM/YYYY HH:MM') as  converted_event_timestamp,
 a.user_name,
 a.user_location
 FROM 
 sports_views a
 INNER JOIN
 (SELECT user_id,MAX(unix_timestamp(event_timestamp,'dd/MM/YYYY HH:MM')) as max_event_timestamp FROM sports_views GROUP BY user_id )b
 ON( a.user_id =b.user_id AND a.converted_event_timestamp =b.max_event_timestamp)
 LIMIT 10;

When I try to run this hive query I get the below error

SemanticException [Error 10002]: Line 8:43 Invalid column reference 'converted_event_timestamp'

Can Someone tell me what went wrong on this hive query and how do I fix this ?

Upvotes: 0

Views: 92

Answers (4)

overflow
overflow

Reputation: 79

select a.userid, a.unix_timestamp(event_timestamp,'yyyy/MM/dd hh:MM') as min 
    from sports_views a 
    inner join (select userid, first_value(1) over(order by time desc) as max 
        from(select userid,unix_timestamp(event_timestamp,'yyyy/MM/dd hh:MM') as time 
        from sports_views)sv )e) 
where a.min=max

Upvotes: 0

HLGEM
HLGEM

Reputation: 96640

I see where you have named a.converted_event_timestamp in your query. You can't use it in the join because that is likely evaluated before the conversions in the select. Join to this instead

unix_timestamp(a.event_timestamp,'dd/MM/YYYY HH:MM')

Upvotes: 1

Elias
Elias

Reputation: 47

SELECT a.user_id,a.user_name, a.user_location from

(SELECT a.user_id, unix_timestamp(event_timestamp,'dd/MM/YYYY HH:MM') as converted_event_timestamp, a.user_name, a.user_location FROM sports_views ) a INNER JOIN (SELECT user_id,MAX(unix_timestamp(event_timestamp,'dd/MM/YYYY HH:MM')) as max_event_timestamp FROM sports_views GROUP BY user_id )b ON( a.user_id =b.user_id AND a.converted_event_timestamp =b.max_event_timestamp) LIMIT 10;

Upvotes: 1

leftjoin
leftjoin

Reputation: 38335

It does not support join by alias. Use unix_timestamp(event_timestamp,'dd/MM/YYYY HH:MM') in join condition instead or calculate it in a subquery.

ON( a.user_id =b.user_id AND unix_timestamp(a.event_timestamp,'dd/MM/YYYY HH:MM')=b.max_event_timestamp)

Upvotes: 0

Related Questions