filtering long format df by column condition

Question

I have a dataframe that includes timestamps of orders, that has 2 status including the worked hours

import pandas as pd

df = pd.DataFrame({'order': ['1', '1', '1', '1',
                             '2', '2', '2', '2', '2'],
                   'status': ['A', 'A', 'A', 'B',
                              'A', 'A', 'A', 'A', 'B'],
                   'started_work_hour': ['10:05', '10:10', '11:15', '11:07',
                                         '09:00', '09:10', '09:25', '09:30', '09:20']})
df['started_work_hour'] = pd.to_datetime(df['started_worked_hour'])
print(df)

#  order status started_work_hour
#0     1      A             10:05
#1     1      A             10:10
#2     1      A             11:15
#3     1      B             11:07
#4     2      A             09:00
#5     2      A             09:10
#6     2      A             09:25
#7     2      A             09:30
#8     2      B             09:20

I need to filter all status A that his time is over status B, so for example in order 1 the dataframe will include 10:05 and 10:10 in status A and 11:07 in status B.

#     order status started_work_hour
#0     1      A             10:05
#1     1      A             10:10
#2     1      B             11:07
#3     2      A             09:00
#4     2      A             09:10
#5     2      B             09:20

I tried using pivot table and then filtering by row:

df = df.reset_index()
df = df.pivot_table(index=['index', 'order'],
                    columns=['status'],
                    values=['started_work_hour'], aggfunc='first')

df = df[ df['A'] < df['B']]

But when I'm doing it, obviously I got NaN in columns A and B so it's not working. How can I filter it? Thanks for the help!

Quang Hoang · Accepted Answer

You can use map which is a simpler version of merge:

B_val = df.query('status=="B"').set_index('order')['started_work_hour']
df[df['started_work_hour'] < df['order'].map(B_val)]

Output:

  order status started_work_hour
0     1      A             10:05
1     1      A             10:10
4     2      A             09:00
5     2      A             09:10

filtering long format df by column condition

Answers (2)

Related Questions