How do I group records based the value of the next record?

Question

I am trying to group together asset tracking records based a date/time stamp and I'm stumped, hoping someone has an easy way to accomplish this.

I have a table that contains the following columns:

asset_id
date_time_stamp
zone_name

A sample dataset would look like this:

09 , 2016-02-01 18:57:23.000 , zone_0
10 , 2016-02-01 18:57:23.000 , zone_1
10 , 2016-02-01 18:57:34.000 , zone_1
10 , 2016-02-01 18:57:45.000 , zone_2
11 , 2016-02-01 18:56:17.000 , zone_3
11 , 2016-02-01 18:56:32.000 , zone_3
11 , 2016-02-01 18:56:43.000 , zone_3
11 , 2016-02-01 18:56:48.000 , zone_3
12 , 2016-02-01 18:56:43.000 , zone_1
12 , 2016-02-01 18:56:53.000 , zone_1
12 , 2016-02-01 18:57:54.000 , zone_3
12 , 2016-02-01 18:59:01.000 , zone_3
12 , 2016-02-01 18:59:14.000 , zone_1
12 , 2016-02-01 18:59:45.000 , zone_1
10 , 2016-02-01 18:58:23.000 , zone_1
10 , 2016-02-01 18:59:56.000 , zone_1

What I'd like to know is, how long did each asset spend in a zone before moving to the next zone? I can't group on zone_name because I want to know that an asset moved from zone 1 to zone 2 and then back to zone 1 again.

The output I am looking for is:

asset_id
zone_name
date_time_stamp_entered_zone
date_time_stamp_exited_zone

With a dataset of:

09 , zone_0 , 2016-02-01 18:57:23.000 , 2016-02-01 18:57:34.000
10 , zone_1 , 2016-02-01 18:57:23.000 , 2016-02-01 18:57:34.000
10 , zone_2 , 2016-02-01 18:57:45.000 , 2016-02-01 18:57:45.000
10 , zone_1 , 2016-02-01 18:58:23.000 , 2016-02-01 18:59:56.000
11 , zone_3 , 2016-02-01 18:56:43.000 , 2016-02-01 18:56:48.000
12 , zone_1 , 2016-02-01 18:56:43.000 , 2016-02-01 18:56:53.000
12 , zone_3 , 2016-02-01 18:57:54.000 , 2016-02-01 18:59:01.000
12 , zone_1 , 2016-02-01 18:59:14.000 , 2016-02-01 18:59:14.000

Gordon Linoff · Accepted Answer

You need to identify groups of zones that are the same. One technique is the "difference of row numbers" approach. This will assign a grouping id to each zone, which can then be used for aggregation:

select asset_id, zone_name, min(date_time_stamp), max(date_time_stamp)
from (select t.*,
             (row_number() over (partition by asset_id order by date_time_stamp) -
              row_number() over (partition by asset_id, zone_name order by date_time_stamp)
              ) as grp              
      from t
     ) t
group by asset_id, zone_name, grp;

The "magic" in this approach is the difference in row numbers. I would suggest that you run the subquery to see what it produces . . . you can also calculate the row numbers separately to better understand how it works.

How do I group records based the value of the next record?

Answers (1)

Related Questions