Reputation: 1227
I have a large sample of time stamped GPS data for a number of vehicles in text file format. Each vehicles data has a unique ID. I created a Pandas dataframe easily enough and then realized that the each vehicles GPS data is effectively one continuous track for several months.
What I would like to do is isolate individual journeys by splitting the track where the gap in GPS reporting exceeds a certain delta (e.g. 10 minutes). I don't think I can assume that the position does not change between the end of one journey and the beginning of the next (although it /should/ be very close).
uid ts lat lon
ABC 2017-01-01 00:00:00 0.0000 0.0000
ABC 2017-01-01 00:00:05 0.0000 0.0100
ABC 2017-01-01 00:00:10 0.0000 0.0200
ABC 2017-01-01 00:10:00 0.0100 0.0300 <--- New Journey. 10 min delta
ABC 2017-01-01 00:10:05 0.0100 0.0400
ABC 2017-01-01 00:10:10 0.0100 0.0500
ABC 2017-01-01 00:10:15 0.0100 0.0600
DEF 2017-01-01 20:00:00 1.0000 1.0000
DEF 2017-01-01 20:00:05 1.0000 1.0100
DEF 2017-01-01 20:00:10 1.0000 1.0200
DEF 2017-01-01 20:20:00 1.0100 1.0300 <--- New Journey. 20 min delta
DEF 2017-01-01 20:20:05 1.0100 1.0400
DEF 2017-01-01 20:20:10 1.0100 1.0500
DEF 2017-01-01 20:20:15 1.0100 1.0600
Can anyone suggest how I might efficiently go about isolating separate journeys? A solution with Pandas is absolutely not essential.
Upvotes: 2
Views: 392
Reputation: 57033
The following splits the dataframe df
into a list of dataframes:
delta = pd.to_timedelta(10, unit='m')
breaks = df['ts'].diff() > delta # Feel free to add other conditions!
#0 False
#....
#6 False
#7 True
#8 False
#9 False
#10 True
#11 False
#12 False
#13 False
#Name: ts, dtype: bool
break_locs = df[breaks].index
#Int64Index([7, 10], dtype='int64')
trips = np.array_split(df, break_locs)
#[ uid ts lat lon
#0 ABC 2017-01-01 00:00:00 0.00 0.00
#1 ABC 2017-01-01 00:00:05 0.00 0.01
#2 ABC 2017-01-01 00:00:10 0.00 0.02
#3 ABC 2017-01-01 00:10:00 0.01 0.03
#4 ABC 2017-01-01 00:10:05 0.01 0.04
#5 ABC 2017-01-01 00:10:10 0.01 0.05
#6 ABC 2017-01-01 00:10:15 0.01 0.06, uid ts lat lon
#7 DEF 2017-01-01 20:00:00 1.0 1.00
#8 DEF 2017-01-01 20:00:05 1.0 1.01
#9 DEF 2017-01-01 20:00:10 1.0 1.02, uid ts lat lon
#10 DEF 2017-01-01 20:20:00 1.01 1.03
#11 DEF 2017-01-01 20:20:05 1.01 1.04
#12 DEF 2017-01-01 20:20:10 1.01 1.05
#13 DEF 2017-01-01 20:20:15 1.01 1.06]
len(trips)
#3
Upvotes: 5