Asgmch
Asgmch

Reputation: 15

HiveQL - Calculate Time Difference Between Two Rows Based On A Condition

I want to calculate the time difference for each ID like this: Time_difference1 is the difference in timestamps when status=4 - status=2, and Time_difference2 is the difference when status=3 - status=2.

My table looks like this

id  status  timestamp
16  1       12.45.12
16  2       12.45.30
16  3       12.45.55
16  4       12.46.15
11  1       12.45.46
11  2       12.45.55
11  3       12.46.11
11  4       12.46.34
27  1       12.48.01
27  2       12.48.18
27  3       12.48.42
27  4       12.48.52

So the result should look like this:

id  timediff1   timediff2
16  0.00.45     0.00.25
11  0.00.25     0.00.16
27  0.00.41     0.00.24

I've tried solutions like

SELECT id,
   status
   timestamp,
   (to_unix_timestamp(case1) - to_unix_timestamp(timestamp)) AS timediff1
FROM (
  SELECT t.*,
         CASE WHEN status=4 THEN timestamp END OVER (PARTITION BY id ORDER BY timestamp ASC) AS case1
  FROM table t 
)
WHERE status = 2

But it doesn't work. The OVER PARTITION BY part gives error: mismatched input 'FROM' expecting ; line 5 pos 0

Anyone have idea how to proceed?

Upvotes: 1

Views: 208

Answers (1)

Gordon Linoff
Gordon Linoff

Reputation: 1269463

I want to calculate the time difference for each ID like this: Time_difference1 is the difference in timestamps when status=4 - status=2, and Time_difference2 is the difference when status=3 - status=2.

Use conditional aggregation:

SELECT id,
       (max(to_unix_timestamp(case when status = 4 then timestamp end)) - 
        max(to_unix_timestamp(case when status = 2 then timestamp end))
       ) AS timediff1,
       (max(to_unix_timestamp(case when status = 3 then timestamp end)) - 
        max(to_unix_timestamp(case when status = 2 then timestamp end)
       ) AS timediff2)
FROM t 
GROUP BY id

Upvotes: 1

Related Questions