Reputation: 145
I have the following dataframe:
account_id contract_id status
0 AAA 123 Active
1 AAA 456 Inactive
2 AAA 912 Active
3 BB 462 Inactive
4 BB 821 Inactive
I want the following output:
account_id status
0 AAA Active
1 BB Inactive
The condition for this is that I need the status for each account and if at least one contract for that account is "Active" then the account will be considered as "Active". If all the contracts for the account is "Inactive", then the account will be considered as "Inactive".
Upvotes: 1
Views: 103
Reputation: 862601
If only Active
and Inactive
values in status
column use DataFrame.sort_values
with DataFrame.drop_duplicates
:
df = df.sort_values(['account_id','status']).drop_duplicates('account_id')[['account_id','status']]
print (df)
account_id status
1 AAA Active
3 BB Inactive
If not first use Series.where
for set non Active, Inactive
to missing values:
s = df['status'].where(df['status'].isin(['Active', 'Inactive']))
df = df.assign(status=s).sort_values(['account_id','status']).drop_duplicates('account_id')[['account_id','status']]
print (df)
account_id status
1 AAA Active
3 BB Inactive
Another idea is aggregate min
:
#non `Active, Inactive` are set to missing values
df['status'] = df['status'].where(df['status'].isin(['Active', 'Inactive']))
df = df.groupby('account_id', as_index=False)['status'].min()
print (df)
account_id status
0 AAA Active
1 BB Inactive
Upvotes: 2