Lroy_12374
Lroy_12374

Reputation: 67

How to aggregate based on condition of a column in pandas?

I have a dataframe which looks like this:

    df:
    id|flag|fee
    1 |0   |5
    1 |0   |5
    1 |1   |5
    1 |1   |5

    DESRIED df_aggregated:
    id|flag|fee
    1 |2   |10

The aggregate should count the number of flags per id and the fee should sum per id when the flag is set to 1:

df1=df.groupby(['id'])["flag"].apply(lambda x : x.astype(int).count()).reset_index()
df2=df.groupby(['id'])["fee"].apply(lambda x : x.astype(int).sum()).reset_index()
df_aggregated=pd.merge(df1, df2, on='id', how='inner')

   ACTUAL df_aggregated:
   id|flag|fee
   1 |2   |20

My fee aggregation is NOT correct/complete because it is not accounting for the condition of only summing the fee IF THE FLAG=1. Instead if sums up all fees regarding of the flag. How do I change my code to account for this condition? It should look like the DESIRED df_aggregated table.

Thanks!

Upvotes: 0

Views: 180

Answers (1)

Quang Hoang
Quang Hoang

Reputation: 150755

You need to check for the condition flag==1. In doing so, you can multiply fee with df.flag.eq(1):

(df.assign(fee=df.fee*df.flag.eq(1))
   .groupby('id', as_index=False)
   .agg({'flag':'nunique', 'fee':'sum'})
)

Output:

   id  flag  fee
0   1     2   10

If you want both to count/sum only where flag==1, you can do a query first:

(df.query('flag==1')
   .groupby('id', as_index=False)
   .agg({'flag':'count', 'fee':'sum'})
)

which incidentally gives the same output as above.

Upvotes: 1

Related Questions