mlys
mlys

Reputation: 1

Split dataset when time intervals exceed specific value and assign a new trip ID to new groups

I have a dataset of GPS locations with corresponding Trip ID, date and time, and time intervals in minutes between successive points within each trip:

> example
 TripID            DATIM INTV
1   522 22/05/2010 11:05  120
2   522 22/05/2010 13:05  120
3   522 22/05/2010 15:05  120
4   522 22/05/2010 17:05  120
5   522 22/05/2010 19:05  120
6   522 22/05/2010 21:05  120
7    10 28/05/2010 11:05  120
8    10 28/05/2010 13:05  120
9    10 29/05/2010 09:05 1200
10   10 29/05/2010 11:05  120
11   10 29/05/2010 13:05  120
12   10 29/05/2010 15:05  120
13   10 29/05/2010 17:05  120
14  657 04/06/2010 11:05  120
15  657 04/06/2010 13:05  120
16  657 04/06/2010 15:05  120

I want to split the data within trips when time intervals exceed 240 min, and assign a new TripID to the new group. In my example, I want to assign a new trip ID to the rows 9 to 13, as the time interval between row 8 and 9 exceeds 240 min, to obtain the following dataset:

 TripID            DATIM INTV
1   522 22/05/2010 11:05  120
2   522 22/05/2010 13:05  120
3   522 22/05/2010 15:05  120
4   522 22/05/2010 17:05  120
5   522 22/05/2010 19:05  120
6   522 22/05/2010 21:05  120
7    10 28/05/2010 11:05  120
8    10 28/05/2010 13:05  120
9   333 29/05/2010 09:05 1200
10  333 29/05/2010 11:05  120
11  333 29/05/2010 13:05  120
12  333 29/05/2010 15:05  120
13  333 29/05/2010 17:05  120
14  657 04/06/2010 11:05  120
15  657 04/06/2010 13:05  120
16  657 04/06/2010 15:05  120

Here is the bit of code I started to write:

TripIDs<-unique(example$TripID)

for (i in length(TripIDs)){
  Trip<-example[which(example$TripID == TripIDs[i]),] #split by trip
  breaks<-Trip$INTV[Trip$INTV>=1200] #define the breaks
  groups<-cut(Trip$INTV,breaks = breaks) #cut the trip at defined breaks
  ddply(Trip,"groups",**function()**) # assign a new name to each group of the trip
}

My problem is using the ddply function, which requires a function to assign a unique name to each new group of the trip. I am not sure the ddply function is appropriate here, and wanted to ask if anybody had an idea on how to split the data within my trip when time intervals exceed 240 min and assign a unique Trip ID to each new created group.

Many thanks

Upvotes: 0

Views: 74

Answers (1)

mef jons
mef jons

Reputation: 302

I think a lot of problems involve [1] mapping out some condition using Boolean tests and subscripting and [2] split-apply-combine. I think it makes sense to split, apply, and combine it myself BEFORE using a clean abstraction like plyr::ddply, to build intuition and work thru the problem on a granular level.

fix_id <- function(df) {
    if (any(df$INTV > 240)) df$TripID <- 999999
    # Assuming if ANY INTV in the group is > 240, make a new id for the group.
    # 999999 is an example id; you'll have to find a meaningful way to set ids.
    return(df)
}

splitted <- split(example, example$TripID)
applied <- lapply(splitted, fix_id)
combined <- plyr::rbind.fill(applied)

If that works as expected, then I might do this instead:

plyr::ddply(df, 'TripID', fix_id)

I'm NOT addressing how to meaningfully assign new TripIDs because I don't think I'm familiar enough with the problem. BUT one option is to use a function operator to maintain state across different calls of fix_id; so it'd start counting at the highest value of TripIDs and add one to the count every time it's called.

Upvotes: 0

Related Questions