Reputation: 2871
I have a dataframe as shown below
df:
ID Age_days N_30 N_31_90 N_91_180 N_181_365 Group
1 201 60 15 30 40 Good
2 20 2 15 5 20 Normal
3 10 4 0 0 0 Normal
4 100 0 0 0 80 Normal
5 600 0 6 5 60 Good
6 800 0 0 15 0 Good
7 500 10 10 30 40 Normal
8 200 0 0 0 100 Good
9 500 0 0 0 20 Normal
10 80 0 12 0 20 Normal
where
N_30 - Number of transactions in last 30 days
N_31_90 - Number of transactions in last 31 to 90 days and so on.
Conditions for filtering
If Age_days is less than 30, N_31_90, N_91_180, N_181_365 should be 0.
If Age_days is less than 90, N_91_180, N_181_365 should be 0.
If Age_days is less than 180, N_181_365 should be 0.
But in the above data there are some rows where Age_days is less and transacted before. I would like to filter such rows.
Expected output:
ID Age_days N_30 N_31_90 N_91_180 N_181_365 Group
2 20 2 15 5 20 Normal
4 100 0 0 0 80 Normal
10 80 0 12 0 20 Normal
Upvotes: 1
Views: 657
Reputation: 23217
Use Boolean Mask
to filter conditions:
m1 = (df['Age_days'] <= 30) & ((df['N_31_90'] !=0) | (df['N_91_180'] !=0) | (df['N_181_365'] !=0))
m2 = (df['Age_days'] <= 90) & ((df['N_91_180'] !=0) | (df['N_181_365'] !=0))
m3 = (df['Age_days'] <= 180) & (df['N_181_365'] !=0)
print(df[m1|m2|m3])
m1 is the boolean mask for the invalid condition where Age_days
is <= 30
while there are non-zero values for transactions performed more than 30 days ago. Similarly for m2 and m3.
Then we do a Boolean Or with m1|m2|m3
in df[m1|m2|m3]
to filter the rows with any one of the 3 invalid conditions.
Output:
ID Age_days N_30 N_31_90 N_91_180 N_181_365 Group
1 2 20 2 15 5 20 Normal
3 4 100 0 0 0 80 Normal
9 10 80 0 12 0 20 Normal
Upvotes: 1
Reputation: 1775
You can use this one liner:
import numpy as np
df2 = df.loc[df['Age_days'] < np.maximum(np.maximum((df['N_31_90'] > 0) * 31 , (df['N_91_180'] > 0) * 91), (df['N_181_365'] > 0) * 181)]
Upvotes: 1