Chethan
Chethan

Reputation: 611

Remove the rows of dataframe based on date and flag condition in pandas

I have a dataframe

df = pd.DataFrame([["A","13-02-2022","B","FALSE"],["A","13-02-2022","C","FALSE"],["A","14-02-2022","D","FALSE"],
                   ["A","14-02-2022","E","FALSE"],["A","16-02-2022","A","TRUE"],["A","16-02-2022","F","FALSE"],
                   ["A","17-02-2022","G","FALSE"],["A","17-02-2022","H","FALSE"],["A","18-02-2022","I","FALSE"],
                   ["A","18-02-2022","J","FALSE"]],columns=["id1","date","id2","flag"])
id1   date     id2  flag
A   13-02-2022  B   FALSE
A   13-02-2022  C   FALSE
A   14-02-2022  D   FALSE
A   14-02-2022  E   FALSE
A   16-02-2022  A   TRUE
A   16-02-2022  F   FALSE
A   17-02-2022  G   FALSE
A   17-02-2022  H   FALSE
A   18-02-2022  I   FALSE
A   18-02-2022  J   FALSE

I want to remove all the rows of previous working day, next working day and the day where flag is TRUE.

For example here 16th Feb flag is TRUE, so remove all the rows of previous working day 14th Feb, next working day 17th Feb and 16th Feb. If TRUE is in last day of month 28th Feb where next working day is not there, then remove the rows of TRUE flag day and previous working day only.

Expected Output:

df_out = pd.DataFrame([["A","13-02-2022","B","FALSE"],["A","13-02-2022","C","FALSE"],["A","18-02-2022","I","FALSE"],
                       ["A","18-02-2022","J","FALSE"]],columns=["id1","date","id2","flag"])
id1   date     id2  flag
A   13-02-2022  B   FALSE
A   13-02-2022  C   FALSE
A   18-02-2022  I   FALSE
A   18-02-2022  J   FALSE

How to do it?

Upvotes: 1

Views: 219

Answers (2)

Guru Stron
Guru Stron

Reputation: 141665

You can try to create a filter data frame and select everything which is not in it:

df['date'] = pd.to_datetime(df['date'], format="%d-%m-%Y")

dates = df[df.flag == 'TRUE']['date']
to_drop = pd.concat([dates, dates + pd.offsets.BusinessDay(1), dates - pd.offsets.BusinessDay(1)])
df_out = df[~df['date'].isin(to_drop)]
df_out

Upvotes: 1

mozway
mozway

Reputation: 260335

You can use boolean indexing:

# ensure boolean and datetime
df['flag'] = df['flag'].eq('TRUE')
df['date'] = pd.to_datetime(df['date'], dayfirst=True)

bday = pd.offsets.BusinessDay(1)

drop = pd.concat([dates+bday, dates-bday])

out = df[~(df['date'].isin(drop) | df['flag'])]

Output:

  id1       date id2   flag
0   A 2022-02-13   B  False
1   A 2022-02-13   C  False
2   A 2022-02-14   D  False
3   A 2022-02-14   E  False
5   A 2022-02-16   F  False
8   A 2022-02-18   I  False
9   A 2022-02-18   J  False

Upvotes: 2

Related Questions