Reputation: 517
I have a df with duplicates as such:
ID start_time end_time
A 12:00 12:05
A 12:00 12:05
A 12:00 12:05
B 12:10 12:22
B 12:10 12:22
C 12:20 12:25
C 12:20 12:25
I'm trying to create a column that tells whether an ID's start_time is less then the previous ID's end_time. The resultant df should be as follows:
ID start_time end_time X
A 12:00 12:05 False
A 12:00 12:05 False
A 12:00 12:05 False
B 12:10 12:22 False
B 12:10 12:22 False
C 12:20 12:25 True
C 12:20 12:25 True
As you can see, ID = C is true since its start_time of 12:20 is less than the previous ID end_time of 12:22. The resulting df must keep the duplicates due to unique columns not included in the scope of this issue.
I think I'm pretty close with:
df['x'] = df.drop_duplicates('ID')['start_time'].shift().le(df['end_time'])
However, the result doesn't carry the correct result through the duplicates.
Upvotes: 1
Views: 49
Reputation: 71689
Use:
df["X"] = (
df["ID"].ne(df["ID"].shift())
& pd.to_datetime(df["start_time"]).lt(pd.to_datetime(df["end_time"]).shift())
)
df["X"] = df.groupby("ID")["X"].transform(lambda g: any(g))
print(df)
This prints:
ID start_time end_time X
0 A 12:00 12:05 False
1 A 12:00 12:05 False
2 A 12:00 12:05 False
3 B 12:10 12:22 False
4 B 12:10 12:22 False
5 C 12:20 12:25 True
6 C 12:20 12:25 True
Upvotes: 1