Sanja
Sanja

Reputation: 41

Pandas groupby based on time difference

I am trying to assign a group number to a set of lines with the small time difference inside one group. To do that, I have tried using:

df["Group_id"] = df.groupby("DEVICE")["DATE_EVT"].diff().gt('0 days 00:00:60').cumsum()

However, diff creates nan values for each first element in the group, which causes sometimes the merge between two consecutive groups. To fill in nan values with some large value, I have tried using:

df["Group_id"] = df.groupby("DEVICE")["DATE_EVT"].diff().fillna('10 days 00:00:00.000000').gt('0 days 00:00:60').cumsum()

but I get an error message:

TypeError: '>' not supported between instances of 'Timedelta' and 'str'

Here is the excert of dataframe:

enter image description here

The group 12877 should end when DEV1 has ended since there is a groupby DEVICE.

Upvotes: 1

Views: 552

Answers (1)

Sanja
Sanja

Reputation: 41

Casting both of fields to time delta has solved the problem:

df["Group_id"] = df.groupby("DEVICE")["DATE_EVT"].diff().fillna(pd.Timedelta('10 days 00:00:00.000000')).gt(pd.Timedelta('0 days 00:00:60')).cumsum()

Upvotes: 1

Related Questions