Yam
Yam

Reputation: 145

Retain a row based on a condition

I have the following dataframe:

 account_id contract_id status
0   AAA 123 Active
1   AAA 456 Inactive
2   AAA 912 Active
3   BB  462 Inactive
4   BB  821 Inactive

I want the following output:

 account_id status
0   AAA Active
1   BB  Inactive

The condition for this is that I need the status for each account and if at least one contract for that account is "Active" then the account will be considered as "Active". If all the contracts for the account is "Inactive", then the account will be considered as "Inactive".

Upvotes: 1

Views: 103

Answers (1)

jezrael
jezrael

Reputation: 862601

If only Active and Inactive values in status column use DataFrame.sort_values with DataFrame.drop_duplicates:

df = df.sort_values(['account_id','status']).drop_duplicates('account_id')[['account_id','status']]
print (df)
  account_id    status
1        AAA    Active
3         BB  Inactive

If not first use Series.where for set non Active, Inactive to missing values:

s = df['status'].where(df['status'].isin(['Active', 'Inactive']))
df = df.assign(status=s).sort_values(['account_id','status']).drop_duplicates('account_id')[['account_id','status']]
print (df)
  account_id    status
1        AAA    Active
3         BB  Inactive

Another idea is aggregate min:

#non `Active, Inactive` are set to missing values
df['status'] = df['status'].where(df['status'].isin(['Active', 'Inactive']))
df = df.groupby('account_id', as_index=False)['status'].min()
print (df)
  account_id    status
0        AAA    Active
1         BB  Inactive

Upvotes: 2

Related Questions