Reputation: 4914
I have a dataframe that I would like to calculate a new column for. The column would contain ax+by; x and y are columns while a and b are scalars. The catch is that the scalar coefficients are different for different groups defined in a third column 'groups'. I tried doing this by looping over the groups but I get NULL as my output. "Coefficients" is a dictionary of lists [a,b]
for group in df['groups'].unique():
coef_list = Coefficients[group]
temp = df[df['groups']==group ]
df['calculation'] = coef_list[0] * temp['x'] + coef_list[1] * temp['y']
I thought it would work by only adding values by matching on index. Do I need to use assign or something.
Upvotes: 1
Views: 37
Reputation: 294218
Consider the dataframe df
and Coefficients dictionary coef
coef = dict(G1=[1, 2], G2=[3, 4])
df = pd.DataFrame(dict(
groups=['G1'] * 4 + ['G2'] * 4,
x=range(1, 9), y=range(1, 9)[::-1]
))
Solution
We can generate the grouping values with map
and use of the pd.DataFrame
constructor
myvars = ['x', 'y']
gvals = pd.DataFrame(df.groups.map(coef).values.tolist(), df.index, myvars)
gvals
x y
0 1 2
1 1 2
2 1 2
3 1 2
4 3 4
5 3 4
6 3 4
7 3 4
We can then assign
a new column after multiplying and summing.
df.assign(calculation=gvals.mul(df[myvars]).sum(1))
groups x y calculation
0 G1 1 8 17
1 G1 2 7 16
2 G1 3 6 15
3 G1 4 5 14
4 G2 5 4 31
5 G2 6 3 30
6 G2 7 2 29
7 G2 8 1 28
This can be made quicker by accessing the numpy arrays
df.assign(calculation=(gvals.values * df[myvars].values).sum(1))
Upvotes: 2