pandas dataframe create a new column whose values are based on groupby sum on another column

Question

I am trying to create a new column amount_0_flag for a df, the values in that column are based on groupby another column key, for which if amount sum is 0, assigned True to amount_0_flag, otherwise False. The df looks like,

key    amount    amount_0_flag    negative_amount
1       1.0      True             False
1       1.0      True             True
2       2.0      False            True
2       3.0      False            False
2       4.0      False            False

so when df.groupby('key'), cluster with key=1, will be assigned True to amount_0_flag for each element of the cluster, since within the cluster, one element has negative 1 and another element has postive 1 as their amounts.

df.groupby('key')['amount'].sum()

only gives the sum of amount for each cluster not considering values in negative_amount and I am wondering how to also find the cluster and its rows with 0 sum amounts consdering negative_amount values using pandas/numpy.

Scott Boston · Accepted Answer

Let's try this where I created a 'new_column' showing the comparison to your 'amount_0_flag':

df['new_column'] = (df.assign(amount_n = df.amount * np.where(df.negative_amount,-1,1))
                      .groupby('key')['amount_n']
                      .transform(lambda x: sum(x)<=0))

Output:

   key  amount  amount_0_flag  negative_amount  new_column
0    1     1.0           True            False        True
1    1     1.0           True             True        True
2    2     2.0          False             True       False
3    2     3.0          False            False       False
4    2     4.0          False            False       False

pandas dataframe create a new column whose values are based on groupby sum on another column

Answers (1)

Related Questions