Roshankumar
Roshankumar

Reputation: 365

tagging the column based on conditions

Existing Dataframe :

Id        Month       Year        processed      success
A          Jan        2021             0           0
A          Feb        2021             0           1
A          Mar        2021             1           0
B          Jan        2021             0           1
B          Feb        2021             0           0
B          Mar        2021             0           0
B          Apr        2021             0           0
C          Dec        2021             0           0
C          Jan        2022             0           0
C          Feb        2022             1           0

Expected Dataframe :

  Id            final_status
  A             Paid
  B             UnPaid
  C             Paid

I am trying to create a Tag as Paid and UnPaid for the Id. Condition for UnPaid is to check for the last three consecutive month if either success or Processed doen't have any count it is to be flagged as "Unpaid" else "Paid"

stuck with applying conditions.

Upvotes: 2

Views: 40

Answers (3)

G.G
G.G

Reputation: 765

def function1(dd:pd.DataFrame):
    idxmin1=dd.query("processed==0 and success==0").index.min()
    return "UnPaid" if dd.loc[idxmin1:].pipe(len)>=3 else "Paid"

col1=((df1.processed.add(df1.success))>0).cumsum()
df1.groupby(['Id',col1]).apply(function1).rename("final_status").groupby(level=0).first()

Output:

  Id final_status
0  A         Paid
1  B       UnPaid
2  C         Paid

Upvotes: 0

mozway
mozway

Reputation: 260420

You can get the last 3 rows per group, and check whether there is at least one 1 in the whole slice.

For this use groupby.apply, in which the last 3 rows/values are selected with tail, compared to 1 with eq and finally check if there is at least one occurrence with all.

One option:

N = 3
(df.groupby('Id')
   .apply(lambda d: d.tail(N).eq(1).any(axis=1).any())
   .map({True: 'Paid', False: 'UnPaid'})
   .reset_index(name='final_status')
)

Or:

N = 3
(df[['processed', 'success']]
 .eq(1).any(axis=1)
 .groupby(df['Id']).apply(lambda s: s.tail(N).any())
 .map({True: 'Paid', False: 'UnPaid'})
 .reset_index(name='final_status')
)

Another with :

N = 3
(df.groupby('Id')
   .apply(lambda d: np.where(d.eq(1).to_numpy()[-N:].any(), 'Paid', 'UnPaid'))
   .reset_index(name='final_status')
)

Output:

  Id final_status
0  A         Paid
1  B       UnPaid
2  C         Paid

Upvotes: 0

jezrael
jezrael

Reputation: 862511

First create helper Series for test if not 1 in both columns by DataFrame.ne and DataFrame.all and then aggregate by GroupBy.agg with numpy.where:

df1 = (df[['processed','success']].ne(1).all(axis=1)
        .groupby(df['Id']).agg(lambda x: np.where(x[-3:].all(), 'UnPaid', 'Paid'))
        .reset_index(name='final_status')
        )
print (df1)
  Id final_status
0  A         Paid
1  B       UnPaid
2  C         Paid

Details:

print (df[['processed','success']].ne(1).all(axis=1))
0     True
1    False
2    False
3    False
4     True
5     True
6     True
7     True
8     True
9    False
dtype: bool

Upvotes: 1

Related Questions