Reputation: 763
I have a table like below in Hive
.
I want to calculate the time difference in seconds
for columns where id
is same and get the value in time_diff
column.
Table
+-----+---------+------------------------+
| id | event | eventdate |
+-----+---------+------------------------+
| 1 | sent | 2017-11-23 03:49:50.0 |
| 1 | sent | 2017-11-23 03:49:59.0 |
| 2 | sent | 2017-11-23 04:49:59.0 |
| 1 | click | 2017-11-24 03:49:50.0 |
+-----+---------+------------------------+
I have done like below
SELECT *, coalesce(unix_timestamp(eventdate) - unix_timestamp(LAG(eventdate) OVER(PARTITION BY ID ORDER BY eventdate)),0) time_diff FROM Table;
Result
+-----+---------+------------------------+-----------+
| id | event | eventdate |time_diff |
+-----+---------+------------------------+-----------+
| 1 | sent | 2017-11-23 03:49:50.0 | 0 |
| 1 | sent | 2017-11-23 03:49:59.0 | 9 |
| 2 | sent | 2017-11-23 04:49:59.0 | 0 |
| 1 | click | 2017-11-24 03:49:50.0 | 86391 |
+-----+---------+------------------------+-----------+
I am getting what I want but with a small exception. In the result where id
is 1
and event
is sent
in the time_diff
column there are two values 0
and 9
. I want all the sent
events to have 0
in the time_diff
column after we apply the lag function.
Expected result
:
+-----+---------+------------------------+-----------+
| id | event | eventdate |time_diff |
+-----+---------+------------------------+-----------+
| 1 | sent | 2017-11-23 03:49:50.0 | 0 |
| 1 | sent | 2017-11-23 03:49:59.0 | 0 |
| 2 | sent | 2017-11-23 04:49:59.0 | 0 |
| 1 | click | 2017-11-24 03:49:50.0 | 86391 |
+-----+---------+------------------------+-----------+
How can I get the expected result?
Upvotes: 0
Views: 552
Reputation: 1271131
You can use a case
expression:
SELECT *,
(case when event = 'sent' then 0
else coalesce(unix_timestamp(eventdate) - unix_timestamp(LAG(eventdate) OVER(PARTITION BY ID ORDER BY eventdate)), 0)
end) as time_diff
FROM Table;
Upvotes: 1