user2458922
user2458922

Reputation: 1721

Pandas DataFrame, computing the Time Difference between one row and other row which satisfies a condition

computing the Time Difference based on Two rows, picked upon a condition. First Row is the Current Row, and Second Row is picked upon a condition. Condition can be Value==1 for example.

Illustratively

The Data Frame has Time Stamp and Event Occurred or Not.

enter image description here

Event is [ Master / B / C].

Output DataFrame should Compute, Time Left for Next Event . Eg: Master

Event took place at 12:33. So at 12:30 the Time Left for the Event=3 ie [ 12:33 - 12:30 ] in Minutes. We COuld fill NA if the last next Event is not within the Data Frame time window.

Sample Output

Upvotes: 1

Views: 1084

Answers (1)

Jon Strutz
Jon Strutz

Reputation: 321

Load a test DataFrame:

df = pd.DataFrame({'year': [2019] * 5,
                   'month': [8] * 5,
                   'day': [16] * 5,
                   'hour': [12, 12, 12, 12, 13],
                   'minute': [1, 2, 3, 4, 5]})

df = pd.DataFrame(pd.to_datetime(df), columns=['Time_Stamp'])
df['Event_Master'] = [0, 0, 1, 0, 1]

Loaded DataFrame looks like:

           Time_Stamp  Event_Master
0 2019-08-16 12:01:00             0
1 2019-08-16 12:02:00             0
2 2019-08-16 12:03:00             1
3 2019-08-16 12:04:00             0
4 2019-08-16 13:05:00             1

To solve the problem, I first add a temporary column called 'Next_Timestamp' which at first just grabs the timestamp when given event is a 1. It is pd.NaT otherwise. We can then use the fillna method to backfill the pd.NaT values with the time of the next event for each row. It now contains the time of the next event for each row. Finally, we just subtract the 'Time_Stamp' column from the 'Next_Timestamp' column.

df['Next_Timestamp'] = df[df.Event_Master == 1].Time_Stamp
df['Next_Timestamp'].fillna(method='backfill', inplace=True)

df['TimeDiff'] = df.Next_Timestamp - df.Time_Stamp

The DataFrame now looks like:

           Time_Stamp  Event_Master      Next_Timestamp TimeDiff
0 2019-08-16 12:01:00             0 2019-08-16 12:03:00 00:02:00
1 2019-08-16 12:02:00             0 2019-08-16 12:03:00 00:01:00
2 2019-08-16 12:03:00             1 2019-08-16 12:03:00 00:00:00
3 2019-08-16 12:04:00             0 2019-08-16 13:05:00 01:01:00
4 2019-08-16 13:05:00             1 2019-08-16 13:05:00 00:00:00

Finally, drop the temporary 'Next_Timestamp' column:

df.drop(['Next_Timestamp'], axis=1, inplace=True)

And the final DataFrame looks like:

           Time_Stamp  Event_Master TimeDiff
0 2019-08-16 12:01:00             0 00:02:00
1 2019-08-16 12:02:00             0 00:01:00
2 2019-08-16 12:03:00             1 00:00:00
3 2019-08-16 12:04:00             0 01:01:00
4 2019-08-16 13:05:00             1 00:00:00

Repeat for other columns as needed. Hope this helps!

Upvotes: 1

Related Questions