importm
importm

Reputation: 305

filtering long format df by column condition

I have a dataframe that includes timestamps of orders, that has 2 status including the worked hours

import pandas as pd

df = pd.DataFrame({'order': ['1', '1', '1', '1',
                             '2', '2', '2', '2', '2'],
                   'status': ['A', 'A', 'A', 'B',
                              'A', 'A', 'A', 'A', 'B'],
                   'started_work_hour': ['10:05', '10:10', '11:15', '11:07',
                                         '09:00', '09:10', '09:25', '09:30', '09:20']})
df['started_work_hour'] = pd.to_datetime(df['started_worked_hour'])
print(df)

#  order status started_work_hour
#0     1      A             10:05
#1     1      A             10:10
#2     1      A             11:15
#3     1      B             11:07
#4     2      A             09:00
#5     2      A             09:10
#6     2      A             09:25
#7     2      A             09:30
#8     2      B             09:20

I need to filter all status A that his time is over status B, so for example in order 1 the dataframe will include 10:05 and 10:10 in status A and 11:07 in status B.

#     order status started_work_hour
#0     1      A             10:05
#1     1      A             10:10
#2     1      B             11:07
#3     2      A             09:00
#4     2      A             09:10
#5     2      B             09:20

I tried using pivot table and then filtering by row:

df = df.reset_index()
df = df.pivot_table(index=['index', 'order'],
                    columns=['status'],
                    values=['started_work_hour'], aggfunc='first')

df = df[ df['A'] < df['B']]

But when I'm doing it, obviously I got NaN in columns A and B so it's not working. How can I filter it? Thanks for the help!

Upvotes: 0

Views: 150

Answers (2)

Quang Hoang
Quang Hoang

Reputation: 150735

You can use map which is a simpler version of merge:

B_val = df.query('status=="B"').set_index('order')['started_work_hour']
df[df['started_work_hour'] < df['order'].map(B_val)]

Output:

  order status started_work_hour
0     1      A             10:05
1     1      A             10:10
4     2      A             09:00
5     2      A             09:10

Upvotes: 1

perl
perl

Reputation: 9941

You can merge to a DataFrame containing only status='B' records, then query to find records where started_work_hour is less than or equal to the one in B DataFrame with the same order:

(df
    .merge(
        df[df['status'].eq('B')],
        on='order',
        suffixes=['', '_'])
    .query('started_work_hour <= started_work_hour_')
)[df.columns]

Output:

  order status   started_work_hour
0     1      A 2021-05-10 10:05:00
1     1      A 2021-05-10 10:10:00
3     1      B 2021-05-10 11:07:00
4     2      A 2021-05-10 09:00:00
5     2      A 2021-05-10 09:10:00
8     2      B 2021-05-10 09:20:00

Upvotes: 2

Related Questions