Reputation: 4842
I am new to pandas.
My DataFrame looks like this:
a1 b1 c1 d1 e1
A 10 10 1 2 0
B 20 20 2 1 1
C 30 30 3 1 0
D 40 40 4 1 1
E 40 40 4 1 2
F 40 40 4 1 1
I want to do math operations only for values where e1
is the same.
For example:
(a1A
+ a1C
) / ( c1A
+ c1C
) for values where C
is the same. So I would end up with a dataframe like this:
a1 b1 c1 d1 e1 result
A 10 10 1 2 0 (a1A + a1C) / ( c1A + c1C )
B 20 20 2 1 1 (a1B + a1D+ a1F) / ( c1B + c1D+ c1F )
C 30 30 3 1 0 Do not calculate it because its already calculated
D 40 40 4 1 1 Do not calculate it because its already calculated
E 40 40 4 1 2 (a1E / c1E)
F 40 40 4 1 1 Do not calculate it because its already calculatedcalculated
I do not know how could I apply a condition to the calculations and how would I omit calculations if it has already been calculated.
Thank you for your suggestions.
Upvotes: 0
Views: 81
Reputation: 863166
First aggregate sum per groups, then remove duplicates by Series.drop_duplicates
and last use Series.map
by difference:
s = df.groupby('e1')['a1','c1'].sum()
df['new'] = df['e1'].drop_duplicates().map(s.a1 / s.c1)
print (df)
a1 b1 c1 d1 e1 new
A 10 10 1 2 0 10.0
B 20 20 2 1 1 10.0
C 30 30 3 1 0 NaN
D 40 40 4 1 1 NaN
E 40 40 4 1 2 10.0
F 40 40 4 1 1 NaN
Also I think in pandas obviously map by unique values is not necessary, obviously is used GroupBy.transform
and added new column filled by mapped data:
df2 = df.groupby('e1')['a1','c1'].transform('sum')
print (df2)
a1 c1
A 40 4
B 100 10
C 40 4
D 100 10
E 40 4
F 100 10
df['new'] = df2.a1 / df2.c1
print (df)
a1 b1 c1 d1 e1 new
A 10 10 1 2 0 10.0
B 20 20 2 1 1 10.0
C 30 30 3 1 0 10.0
D 40 40 4 1 1 10.0
E 40 40 4 1 2 10.0
F 40 40 4 1 1 10.0
Upvotes: 3