Keith
Keith

Reputation: 4914

Calculating dataframe column dependant on group

I have a dataframe that I would like to calculate a new column for. The column would contain ax+by; x and y are columns while a and b are scalars. The catch is that the scalar coefficients are different for different groups defined in a third column 'groups'. I tried doing this by looping over the groups but I get NULL as my output. "Coefficients" is a dictionary of lists [a,b]

for group in df['groups'].unique():
    coef_list = Coefficients[group]
    temp = df[df['groups']==group ]
    df['calculation'] = coef_list[0] * temp['x'] + coef_list[1] * temp['y'] 

I thought it would work by only adding values by matching on index. Do I need to use assign or something.

Upvotes: 1

Views: 37

Answers (1)

piRSquared
piRSquared

Reputation: 294218

Consider the dataframe df and Coefficients dictionary coef

coef = dict(G1=[1, 2], G2=[3, 4])
df = pd.DataFrame(dict(
        groups=['G1'] * 4 + ['G2'] * 4,
        x=range(1, 9), y=range(1, 9)[::-1]
    ))

Solution

We can generate the grouping values with map and use of the pd.DataFrame constructor

myvars = ['x', 'y']
gvals = pd.DataFrame(df.groups.map(coef).values.tolist(), df.index, myvars)
gvals

   x  y
0  1  2
1  1  2
2  1  2
3  1  2
4  3  4
5  3  4
6  3  4
7  3  4

We can then assign a new column after multiplying and summing.

df.assign(calculation=gvals.mul(df[myvars]).sum(1))

  groups  x  y  calculation
0     G1  1  8           17
1     G1  2  7           16
2     G1  3  6           15
3     G1  4  5           14
4     G2  5  4           31
5     G2  6  3           30
6     G2  7  2           29
7     G2  8  1           28

This can be made quicker by accessing the numpy arrays

df.assign(calculation=(gvals.values * df[myvars].values).sum(1))

Upvotes: 2

Related Questions