Reputation: 1
I have a dataset of GPS locations with corresponding Trip ID, date and time, and time intervals in minutes between successive points within each trip:
> example
TripID DATIM INTV
1 522 22/05/2010 11:05 120
2 522 22/05/2010 13:05 120
3 522 22/05/2010 15:05 120
4 522 22/05/2010 17:05 120
5 522 22/05/2010 19:05 120
6 522 22/05/2010 21:05 120
7 10 28/05/2010 11:05 120
8 10 28/05/2010 13:05 120
9 10 29/05/2010 09:05 1200
10 10 29/05/2010 11:05 120
11 10 29/05/2010 13:05 120
12 10 29/05/2010 15:05 120
13 10 29/05/2010 17:05 120
14 657 04/06/2010 11:05 120
15 657 04/06/2010 13:05 120
16 657 04/06/2010 15:05 120
I want to split the data within trips when time intervals exceed 240 min, and assign a new TripID to the new group. In my example, I want to assign a new trip ID to the rows 9 to 13, as the time interval between row 8 and 9 exceeds 240 min, to obtain the following dataset:
TripID DATIM INTV
1 522 22/05/2010 11:05 120
2 522 22/05/2010 13:05 120
3 522 22/05/2010 15:05 120
4 522 22/05/2010 17:05 120
5 522 22/05/2010 19:05 120
6 522 22/05/2010 21:05 120
7 10 28/05/2010 11:05 120
8 10 28/05/2010 13:05 120
9 333 29/05/2010 09:05 1200
10 333 29/05/2010 11:05 120
11 333 29/05/2010 13:05 120
12 333 29/05/2010 15:05 120
13 333 29/05/2010 17:05 120
14 657 04/06/2010 11:05 120
15 657 04/06/2010 13:05 120
16 657 04/06/2010 15:05 120
Here is the bit of code I started to write:
TripIDs<-unique(example$TripID)
for (i in length(TripIDs)){
Trip<-example[which(example$TripID == TripIDs[i]),] #split by trip
breaks<-Trip$INTV[Trip$INTV>=1200] #define the breaks
groups<-cut(Trip$INTV,breaks = breaks) #cut the trip at defined breaks
ddply(Trip,"groups",**function()**) # assign a new name to each group of the trip
}
My problem is using the ddply function, which requires a function to assign a unique name to each new group of the trip. I am not sure the ddply function is appropriate here, and wanted to ask if anybody had an idea on how to split the data within my trip when time intervals exceed 240 min and assign a unique Trip ID to each new created group.
Many thanks
Upvotes: 0
Views: 74
Reputation: 302
I think a lot of problems involve [1] mapping out some condition using Boolean tests and subscripting and [2] split-apply-combine. I think it makes sense to split, apply, and combine it myself BEFORE using a clean abstraction like plyr::ddply
, to build intuition and work thru the problem on a granular level.
fix_id <- function(df) {
if (any(df$INTV > 240)) df$TripID <- 999999
# Assuming if ANY INTV in the group is > 240, make a new id for the group.
# 999999 is an example id; you'll have to find a meaningful way to set ids.
return(df)
}
splitted <- split(example, example$TripID)
applied <- lapply(splitted, fix_id)
combined <- plyr::rbind.fill(applied)
If that works as expected, then I might do this instead:
plyr::ddply(df, 'TripID', fix_id)
I'm NOT addressing how to meaningfully assign new TripIDs because I don't think I'm familiar enough with the problem. BUT one option is to use a function operator to maintain state across different calls of fix_id
; so it'd start counting at the highest value of TripIDs and add one to the count every time it's called.
Upvotes: 0