Reputation: 1466
The data below is based on GPS coordinates of a van, whether the ignition was on/off, and how far the van was from a target location at a given time. I want to determine whether a van was at or near a location (<300), whether the ignition was turned off, and if both conditions are true, the time duration of the stay.
In the example below, I visualize rows 1-4 as being "grouped" together since they are consecutive rows where the distance was <300. Row 5 is "grouped" on its own since it was >300, and rows 6-8 are "grouped" together since they are consecutive rows with distance <300.
Accordingly, since the ignition was turned off in rows 1-4, I want to calculate the time duration (since the van "stopped" at the location for a given amount of time). However, the other two groups (row 5 and rows 6-8) should not have a time duration calculation since the ignition was never turned off in those groupings.
df
AcctID On_Off Distance Timestamp
123 On 230 12:00
123 On 30 12:02
123 Off 29 12:05
123 Off 35 12:10
123 On 3000 12:13
123 On 100 12:20
123 On 95 12:22
123 On 240 12:28
I'm able to classify whether the Distance is less than 300 (Within_Distance
), but determining whether the ignition was off for at least one of the rows in the grouping has me stumped. Here's what the final dataframe should look like:
df['Within_Distance'] = np.where(df['Distance']<300, "Yes", "No")
df
AcctID On_Off Distance Timestamp Within_Distance Was_Off Within_Distance_and_Was_Off
123 On 230 12:20 Yes Yes Yes
123 On 30 12:02 Yes Yes Yes
123 Off 29 12:05 Yes Yes Yes
123 Off 35 12:10 Yes Yes Yes
123 On 3000 12:13 No No No
123 On 100 12:20 Yes No No
123 On 95 12:22 Yes No No
123 On 240 12:28 Yes No No
Thanks in advance!
Upvotes: 2
Views: 7001
Reputation: 2369
First, set up a boolean field to work with
df['Off'] = df['On_Off'] == 'Off'
Then construct a field that identifies consecutive rows for groupby
, as shown here
(df['Within_Distance'] != df['Within_Distance'].shift()).cumsum()
And use .any
to identify where the boolean is true for any row in the groupby:
df['Was_Off'] = df.groupby((df['Within_Distance'] != df['Within_Distance'].shift()).cumsum())['Off'].transform(any)
Out[31]:
AcctID On_Off Distance Timestamp Within_Distance Off Was_Off
0 123 On 230 12:00 Yes False True
1 123 On 30 12:02 Yes False True
2 123 Off 29 12:05 Yes True True
3 123 Off 35 12:10 Yes True True
4 123 On 3000 12:13 No False False
5 123 On 100 12:20 Yes False False
6 123 On 95 12:22 Yes False False
7 123 On 240 12:28 Yes False False
Upvotes: 1
Reputation: 153510
Let's try:
df['Within_Distance'] = np.where(df['Distance']<300, "Yes", "No")
df['Was_Off'] = df.groupby((df.Distance > 300).diff().fillna(0).cumsum())['On_Off'].transform(lambda x: 'Yes' if (x == 'Off').any() else 'No')
df['Within_Distinace_and_Was_Off'] = np.where((df['Within_Distance'] == 'Yes') & (df['Was_Off'] == 'Yes'),'Yes','No')
Output:
AcctID On_Off Distance Timestamp Within_Distance Was_Off \
0 123 On 230 12:00 Yes Yes
1 123 On 30 12:02 Yes Yes
2 123 Off 29 12:05 Yes Yes
3 123 Off 35 12:10 Yes Yes
4 123 On 3000 12:13 No No
5 123 On 100 12:20 Yes No
6 123 On 95 12:22 Yes No
7 123 On 240 12:28 Yes No
Within_Distinace_and_Was_Off
0 Yes
1 Yes
2 Yes
3 Yes
4 No
5 No
6 No
7 No
Upvotes: 3