Danish
Danish

Reputation: 2871

Filter a dataframe based specific condition in pandas

I have a dataframe as shown below

df:

ID     Age_days    N_30     N_31_90     N_91_180      N_181_365   Group
1      201         60       15          30            40          Good
2      20          2        15          5             20          Normal
3      10          4        0           0             0           Normal
4      100         0        0           0             80          Normal
5      600         0        6           5             60          Good
6      800         0        0           15            0           Good
7      500         10       10          30            40          Normal     
8      200         0        0           0             100         Good
9      500         0        0           0             20          Normal
10     80          0        12          0             20          Normal

where

N_30 - Number of transactions in last 30 days
N_31_90 - Number of transactions in last 31 to 90 days and so on.

Conditions for filtering

      If Age_days is less than 30, N_31_90, N_91_180, N_181_365 should be 0.
      If Age_days is less than 90, N_91_180, N_181_365 should be 0.
      If Age_days is less than 180, N_181_365 should be 0.

But in the above data there are some rows where Age_days is less and transacted before. I would like to filter such rows.

Expected output:

ID     Age_days    N_30     N_31_90     N_91_180      N_181_365   Group
2      20          2        15          5             20          Normal
4      100         0        0           0             80          Normal
10     80          0        12          0             20          Normal

Upvotes: 1

Views: 657

Answers (2)

SeaBean
SeaBean

Reputation: 23217

Use Boolean Mask to filter conditions:

m1 = (df['Age_days'] <= 30) & ((df['N_31_90'] !=0) | (df['N_91_180'] !=0) | (df['N_181_365'] !=0))
m2 = (df['Age_days'] <= 90) & ((df['N_91_180'] !=0) | (df['N_181_365'] !=0))
m3 = (df['Age_days'] <= 180) & (df['N_181_365'] !=0)

print(df[m1|m2|m3])

m1 is the boolean mask for the invalid condition where Age_days is <= 30 while there are non-zero values for transactions performed more than 30 days ago. Similarly for m2 and m3.

Then we do a Boolean Or with m1|m2|m3 in df[m1|m2|m3] to filter the rows with any one of the 3 invalid conditions.

Output:

   ID  Age_days  N_30  N_31_90  N_91_180  N_181_365   Group
1   2        20     2       15         5         20  Normal
3   4       100     0        0         0         80  Normal
9  10        80     0       12         0         20  Normal

Upvotes: 1

CFreitas
CFreitas

Reputation: 1775

You can use this one liner:

import numpy as np
df2 = df.loc[df['Age_days'] < np.maximum(np.maximum((df['N_31_90'] > 0) * 31 , (df['N_91_180'] > 0) * 91), (df['N_181_365'] > 0) * 181)]

Upvotes: 1

Related Questions