Ipek
Ipek

Reputation: 23

Hive - calculating string type timestamp differences in minutes

I'm novice to SQL (in hive) and trying to calculate every anonymousid's time spent between first event and last event in minutes. The resource table's timestamp is formatted as string, like: "2020-12-24T09:47:17.775Z". I've tried in two ways:

1- Cast column timestamp to bigint and calculated the difference from main table.

select anonymousid, max(from_unixtime(cast('timestamp' as bigint)) - min(from_unixtime(cast('timestamp' as bigint)) from db1.formevent group by anonymousid

I got NULLs after implementing this as a solution.

2- Create a new table from main resource, put conditions to call with 'where' and tried to convert 'timestamp' to date format without any min-max calculation.

create table db1.successtime as select anonymousid, pagepath,buttontype, itemname, 'location', cast(to_date(from_unixtime(unix_timestamp('timestamp', "yyyy-MM-dd'T'HH:mm:ss.SSS"),'HH:mm:ss') as date) from db1.formevent where pagepath = "/account/sign-up/" and itemname = "Success" and 'location' = "Standard"

Then I got NULLs again and I left. It looks like this

Is there any way I can reformat and calculate time difference in minutes between first and last event ('timestamp') and take the average grouped by 'location'?

Upvotes: 1

Views: 655

Answers (2)

leftjoin
leftjoin

Reputation: 38290

select anonymousid,
       (max(unix_timestamp(timestamp, "yyyy-MM-dd'T'HH:mm:ss.SSS'Z'")) - 
        min(unix_timestamp(timestamp, "yyyy-MM-dd'T'HH:mm:ss.SSS'Z'")) 
       ) / 60
from db1.formevent
group by anonymousid;

Upvotes: 1

Gordon Linoff
Gordon Linoff

Reputation: 1269483

From your description, this should work:

select anonymousid,
       (max(unix_timestamp(timestamp, 'yyyy-MM-dd'T'HH:mm:ss.SSS'),'HH:mm:ss') - 
        min(unix_timestamp(timestamp, 'yyyy-MM-dd'T'HH:mm:ss.SSS'),'HH:mm:ss') 
       ) / 60
from db1.formevent
group by anonymousid;

Note that the column name is not in single quotes.

Upvotes: 0

Related Questions