Reputation: 41
I am trying to assign a group number to a set of lines with the small time difference inside one group. To do that, I have tried using:
df["Group_id"] = df.groupby("DEVICE")["DATE_EVT"].diff().gt('0 days 00:00:60').cumsum()
However, diff creates nan values for each first element in the group, which causes sometimes the merge between two consecutive groups. To fill in nan values with some large value, I have tried using:
df["Group_id"] = df.groupby("DEVICE")["DATE_EVT"].diff().fillna('10 days 00:00:00.000000').gt('0 days 00:00:60').cumsum()
but I get an error message:
TypeError: '>' not supported between instances of 'Timedelta' and 'str'
Here is the excert of dataframe:
The group 12877 should end when DEV1 has ended since there is a groupby DEVICE.
Upvotes: 1
Views: 552
Reputation: 41
Casting both of fields to time delta has solved the problem:
df["Group_id"] = df.groupby("DEVICE")["DATE_EVT"].diff().fillna(pd.Timedelta('10 days 00:00:00.000000')).gt(pd.Timedelta('0 days 00:00:60')).cumsum()
Upvotes: 1