duber
duber

Reputation: 2869

How do I subtract the previous row from the current row within a group in Hive?

I've got a table full of server requests like this:

User ID|Timestamp
010101|01-01-14 12:00:00 AM
010101|01-01-14 12:00:10 AM
010101|01-01-14 12:00:30 AM
020101|01-01-14 12:00:00 AM
020101|01-01-14 12:01:00 AM
020101|01-01-14 12:01:20 AM

I'd like to find the lag between requests. The resulting table would look something like this (Assume there was an intermediate step to convert timestamps to Unix format):

User ID|Seconds from last request
010101|0
010101|10  --12:00:10 - 12:00:00
010101|20  --12:00:30 - 12:00:10
020101|0
020101|60  --12:01:00 - 12:00:00 
020101|20  --12:01:20 - 12:01:00

Is there a way to do this in Hive?

Upvotes: 1

Views: 546

Answers (1)

duber
duber

Reputation: 2869

One solution is to update to the latest version of Hive and use the LAG function. See this JIRA ticket.

Upvotes: 1

Related Questions