Steel
Steel

Reputation: 517

Comparing python row values with previous (duplicates)

I have a df with duplicates as such:

ID      start_time      end_time    
A          12:00           12:05
A          12:00           12:05
A          12:00           12:05
B          12:10           12:22
B          12:10           12:22
C          12:20           12:25
C          12:20           12:25

I'm trying to create a column that tells whether an ID's start_time is less then the previous ID's end_time. The resultant df should be as follows:

ID      start_time      end_time      X
A          12:00           12:05     False
A          12:00           12:05     False
A          12:00           12:05     False
B          12:10           12:22     False
B          12:10           12:22     False
C          12:20           12:25     True
C          12:20           12:25     True

As you can see, ID = C is true since its start_time of 12:20 is less than the previous ID end_time of 12:22. The resulting df must keep the duplicates due to unique columns not included in the scope of this issue.

I think I'm pretty close with:

df['x'] = df.drop_duplicates('ID')['start_time'].shift().le(df['end_time'])

However, the result doesn't carry the correct result through the duplicates.

Upvotes: 1

Views: 49

Answers (1)

Shubham Sharma
Shubham Sharma

Reputation: 71689

Use:

df["X"] = (
    df["ID"].ne(df["ID"].shift()) 
    & pd.to_datetime(df["start_time"]).lt(pd.to_datetime(df["end_time"]).shift())
)

df["X"] = df.groupby("ID")["X"].transform(lambda g: any(g))
print(df)

This prints:

  ID start_time end_time      X
0  A      12:00    12:05  False
1  A      12:00    12:05  False
2  A      12:00    12:05  False
3  B      12:10    12:22  False
4  B      12:10    12:22  False
5  C      12:20    12:25   True
6  C      12:20    12:25   True

Upvotes: 1

Related Questions