Reputation: 67
I have a dataframe which looks like this:
df:
id|flag|fee
1 |0 |5
1 |0 |5
1 |1 |5
1 |1 |5
DESRIED df_aggregated:
id|flag|fee
1 |2 |10
The aggregate should count the number of flags per id and the fee should sum per id when the flag is set to 1:
df1=df.groupby(['id'])["flag"].apply(lambda x : x.astype(int).count()).reset_index()
df2=df.groupby(['id'])["fee"].apply(lambda x : x.astype(int).sum()).reset_index()
df_aggregated=pd.merge(df1, df2, on='id', how='inner')
ACTUAL df_aggregated:
id|flag|fee
1 |2 |20
My fee aggregation is NOT correct/complete because it is not accounting for the condition of only summing the fee IF THE FLAG=1. Instead if sums up all fees regarding of the flag. How do I change my code to account for this condition? It should look like the DESIRED df_aggregated table.
Thanks!
Upvotes: 0
Views: 180
Reputation: 150755
You need to check for the condition flag==1
. In doing so, you can multiply fee
with df.flag.eq(1)
:
(df.assign(fee=df.fee*df.flag.eq(1))
.groupby('id', as_index=False)
.agg({'flag':'nunique', 'fee':'sum'})
)
Output:
id flag fee
0 1 2 10
If you want both to count/sum only where flag==1
, you can do a query first:
(df.query('flag==1')
.groupby('id', as_index=False)
.agg({'flag':'count', 'fee':'sum'})
)
which incidentally gives the same output as above.
Upvotes: 1