Reputation: 492
I have a table in Hive that shows the login time (connect_date) for every user (user_id) of our system. Here is the table structure:
hive> describe access_log;
OK
user_id double from deserializer
connect_date timestamp from deserializer
ip string from deserializer
logout_date timestamp from deserializer
session_id string from deserializer
My goal is to find the avg time between logins for each user over a specific time period, lets use the last 7 days for this question.
How would the Hive Query look to achieve this analysis of the data? Thanks.
Upvotes: 2
Views: 3288
Reputation: 356
There are a few ways of doing this:
1) You could aggregate the connect_dates into an array and write a UDF to parse an array of dates/timestamps into the average time between.
2) You could use the windowing functions introduced in hive 0.12 - this is much simpler
select
user_id, avg(connect_date-lag_connect_date)
from
(
select
user_id, connect_date, lag(connect_date) over (partition by user_id order by connect_date) as lag_connect_date
from
access_log
) a
group by
user_id
Upvotes: 4