Pandas GROUPBY inside np.where

Question

I'm getting error with code below:

np.where(df['A'].groupby([df['B'], df['B_1']]).sum() > 0, 1, 0)

error: ValueError: operands could not be broadcast together with shapes (2013,) (1353,) ()

Is it possible to do pandas groupby inside np.where ?

What is the best way to do this?

I would like to sum a column df[A] where column df[B] and df[B_1]

formula in excel:

=IF($J3=$C3,IF(SUMIFS($S:$S,$A:$A,$A3,$C:$C,$C3)>0,1,0),"")

formula in python:

df['C'] = np.where(df['B_1'] == df['B'], np.where(df['competing'].groupby([df['company_id'], df['company_id.1']]).sum() > 0, 1, 0), None)

Parfait · Accepted Answer

Excel's SUMIFS return inline aggregates based on conditions where return values are the same length as input values (i.e., before and after calculation).

To achieve a similar result, consider pandas' groupby().transform() that also returns inline aggregates where returned column is same length as input column(s). Running groupby() by itself collapses records to those groupings returning a different length of values.

df['C'] = np.where(df['B_1'] == df['B'], 
                   np.where(df.groupby(['company_id', 'company_id.1'])['competing'].transform('sum') > 0, 1, 0),
                   np.nan)

Pandas GROUPBY inside np.where

Answers (1)

Related Questions