Reputation: 41
I want to use R-style mutate function, where I can use information from other columns. For example: I want to create a new column whose values are a result of first grouping the variables, and then interpolating one column vs. another column in the same data frame. The new column gets the same value for each group.
I tried to use apply with broadcast, however, it only results in NaN values.
import pandas as pd
import numpy as np
d = {'Gain': [20, 20,19,18,17,21,21,20,19,18],
'Power':[30,31,32,33,34,33,34,35,36,37],
'GRP': ['A','A','A','A','A','B','B','B','B','B'],
}
df = pd.DataFrame(data=d)
# Subtract the value of Gain from the maximum value: THIS STEP WORKS
df['dGain']=df.groupby(['GRP'])['Gain'].transform(lambda x: max(x) - x)
# DOES NOT WORK!!!
df['Pcomp']=df.groupby(['GRP']).transform(lambda x:
np.interp(3,x.dGain,x.Power))
# DOES NOT WORK
df['Pcomp']=df.groupby(['GRP']).apply(lambda x: np.interp(3,x.dGain,x.Power))
I expected:
Gain Power GRP Pcomp dGain
0 20 30 A 33 0
1 20 31 A 33 0
2 19 32 A 33 1
3 18 33 A 33 2
4 17 34 A 33 3
5 21 33 B 36 0
6 21 34 B 36 0
7 20 35 B 36 1
8 19 36 B 36 2
9 18 37 B 36 3
Upvotes: 0
Views: 85
Reputation: 323306
We can say, transform
almost equal to mutate
in R dplyr
, however, they still have slightly different , under the groupby
object
,transform
can pass one , mutate
can do multiple , More info
A quick fix
df['Pcomp']=df.groupby('GRP').apply(lambda x: np.interp(3,x['dGain'],x['Power'])).reindex(df.GRP).values
df
Out[828]:
Gain Power GRP dGain Pcomp
0 20 30 A 0 34.0
1 20 31 A 0 34.0
2 19 32 A 1 34.0
3 18 33 A 2 34.0
4 17 34 A 3 34.0
5 21 33 B 0 37.0
6 21 34 B 0 37.0
7 20 35 B 1 37.0
8 19 36 B 2 37.0
9 18 37 B 3 37.0
Upvotes: 2