Reputation: 511
I have data in Hive table which look something like this -
VIN | Mode | event | Start | End |
---|---|---|---|---|
ABC123456789 | Mode 1 | Deauthorized | 01/01/2010 00:00:00 | 05/05/2014 14:54:54 |
ABC123456789 | Mode 1 | Deauthorized | 05/05/2014 14:54:54 | 05/13/2014 19:09:51 |
ABC123456789 | Mode 1 | Deauthorized | 05/13/2014 19:09:51 | 11/13/2014 22:26:32 |
ABC123456789 | Mode 1 | Authorized | 11/13/2014 22:26:32 | 11/13/2014 22:31:00 |
ABC123456789 | Mode 1 | Authorized | 11/13/2014 22:31:00 | 11/14/2014 01:23:56 |
ABC123456789 | Mode 2 | Deauthorized | 11/14/2014 01:23:56 | 11/18/2014 19:38:51 |
ABC123456789 | Mode 2 | Deauthorized | 11/18/2014 19:38:51 | 11/18/2014 19:38:54 |
ABC123456789 | Mode 2 | Deauthorized | 11/18/2014 19:38:54 | 11/18/2014 20:07:52 |
ABC123456789 | Mode 2 | Authorized | 11/18/2014 20:07:52 | 12/17/2014 19:22:50 |
ABC123456789 | Mode 2 | Authorized | 12/17/2014 19:22:50 | 02/25/2015 20:03:44 |
ABC123456789 | Mode 2 | Authorized | 02/25/2015 20:03:44 | 02/25/2015 20:03:48 |
ABC123456789 | Mode 3 | Authorized | 02/25/2015 20:03:48 | 02/25/2015 20:14:05 |
ABC123456789 | Mode 3 | Deauthorized | 02/25/2015 20:14:05 | 02/25/2015 20:14:29 |
ABC123456789 | Mode 3 | Deauthorized | 02/25/2015 20:14:29 | 02/25/2015 20:40:21 |
I would like to get a summarized data where the value in event column has changed from previous value. The data points are in ascending order of the Start timestamp. I tried window function but it didn't work for me. The result would look similar to what I have shown in below table. Can you suggest any optimized solution for this?
VIN | Mode | event | Start | End |
---|---|---|---|---|
ABC123456789 | Mode 1 | Deauthorized | 01/01/2010 00:00:00 | 05/05/2014 14:54:54 |
ABC123456789 | Mode 1 | Authorized | 11/13/2014 22:26:32 | 11/13/2014 22:31:00 |
ABC123456789 | Mode 2 | Deauthorized | 11/14/2014 01:23:56 | 11/18/2014 19:38:51 |
ABC123456789 | Mode 2 | Authorized | 11/18/2014 20:07:52 | 12/17/2014 19:22:50 |
ABC123456789 | Mode 3 | Deauthorized | 02/25/2015 20:14:05 | 02/25/2015 20:14:29 |
Upvotes: 0
Views: 112
Reputation: 1269843
You can use lag()
:
select t.*
from (select t.*,
lag(event) over (partition by vin order by start) as prev_event
from t
) t
where prev_event is null or prev_event <> event;
This looks at the changes by time and vin
. I'm not sure if the mode
is relevant too. If so, add it to the partition by
.
Upvotes: 2