Reputation: 335
I'm getting error with code below:
np.where(df['A'].groupby([df['B'], df['B_1']]).sum() > 0, 1, 0)
error: ValueError: operands could not be broadcast together with shapes (2013,) (1353,) ()
Is it possible to do pandas groupby
inside np.where
?
What is the best way to do this?
I would like to sum
a column df[A]
where column df[B]
and df[B_1]
formula in excel:
=IF($J3=$C3,IF(SUMIFS($S:$S,$A:$A,$A3,$C:$C,$C3)>0,1,0),"")
formula in python:
df['C'] = np.where(df['B_1'] == df['B'], np.where(df['competing'].groupby([df['company_id'], df['company_id.1']]).sum() > 0, 1, 0), None)
Upvotes: 3
Views: 7319
Reputation: 107652
Excel's SUMIFS
return inline aggregates based on conditions where return values are the same length as input values (i.e., before and after calculation).
To achieve a similar result, consider pandas' groupby().transform()
that also returns inline aggregates where returned column is same length as input column(s). Running groupby()
by itself collapses records to those groupings returning a different length of values.
df['C'] = np.where(df['B_1'] == df['B'],
np.where(df.groupby(['company_id', 'company_id.1'])['competing'].transform('sum') > 0, 1, 0),
np.nan)
Upvotes: 5