GISUser9
GISUser9

Reputation: 125

How to calculate new "normalized" column in a Pandas dataframe?

Given df:

    Value Machine
0      10       A
1      20       A
2      30       A
5      100      B
6      200      B
7      300      B

How can I calculate a new column df['norm'] where values for Machine A are divided (element-wise) by the average of values of Machine A, and the values for Machine B are divided by the average of values of Machine B, yielding:

   Value  Machine  norm
0      10       A  0.5
1      20       A  1.0
2      30       A  1.5
5      100      B  0.5
6      200      B  1.0
7      300      B  1.5

Note that the number of unique machines is variable.

I can calculate the average of each machine using groupby like this:

machine_mean = df.groupby("Machine").mean().reset_index()

I've tried looping through these means, but I can't figure out how to do the element-wise referencing, while at the same time only applying each mean to the right rows.

I thought list comprehension would work, but I had the same problem as above.

Upvotes: 0

Views: 35

Answers (3)

ccolin
ccolin

Reputation: 304

You can use groupby().apply():

df['norm'] = df.groupby('Machine')['Value'].apply(lambda x: x / x.mean())

Upvotes: 0

BENY
BENY

Reputation: 323276

Let us do transform with mean

df['norm'] = df['Value']/df.groupby('Machine')['Value'].transform('mean')
df
Out[209]: 
   Value Machine  norm
0     10       A   0.5
1     20       A   1.0
2     30       A   1.5
5    100       B   0.5
6    200       B   1.0
7    300       B   1.5

Upvotes: 2

Quang Hoang
Quang Hoang

Reputation: 150745

Use groupby().transform:

df['norm'] = df['Value']/df.groupby('Machine')['Value'].transform('sum')

Upvotes: 0

Related Questions