Reputation: 5737
I have a table like access(url, access_time), with possibly many access times per url.
I have another table that is asset(url, foo)
I want to do a query that turns this into joined_data(url, first_access_time, second_access_time)
Where first_access_time is NULL if there's no access time, and second_access_time is NULL if there's no 2nd access time.
How could I do this in hive?
Upvotes: 0
Views: 83
Reputation: 49260
You can use row_number
to do this.
with twotimes as (select ast.url, a.access_time,
row_number() over(partition by a.url order by a.access_time) as rn
from asset ast
left join access a on a.url = ast.url )
select url, max(first_access_time), max(second_access_time)
from (
select url, access_time as first_access_time, null as second_access_time
from twotimes where rn = 1
union all
select url, null as first_access_time, access_time as second_access_time
from twotimes where rn = 2
) t
group by url
Upvotes: 1