Maxim Skoryk
Maxim Skoryk

Reputation: 492

Pandas: normalize values by group

I find it hard to explain with words what I want to achieve, so please don't judge me for showing a simple example instead. I have a table that looks like this:

main_col some_metadata value
this True 10
this False 3
that True 50
that False 10
other True 20
other False 5

I want to normalize this data separately for each case of main_col. For example, if we're to choose min-max normalization and scale it to range [0; 100], I want the output to look like this:

main_col some_metadata value (normalized)
this True 100
this False 30
that True 100
that False 20
other True 100
other False 25

Where for each case of main_col, the highest value is scaled to 100 and another value is scaled in respective proportion.

Upvotes: 5

Views: 1260

Answers (2)

Nuri Taş
Nuri Taş

Reputation: 3845

The normalization formula you are looking for is 100 * (x / x.max()):

df.groupby(['main_col'])['value'].transform(lambda x: 100 * (x / x.max()))

Result:

0    100.0
1     30.0
2    100.0
3     20.0
4    100.0
5     25.0
Name: value, dtype: float64

Upvotes: 1

mozway
mozway

Reputation: 262484

You can use groupby.transform('max') to get the max per group, then normalize in place:

df['value'] /= df.groupby('main_col')['value'].transform('max').div(100)

or:

df['value'] *= df.groupby('main_col')['value'].transform('max').rdiv(100)

output:

  main_col  some_metadata  value
0     this           True  100.0
1     this          False   30.0
2     that           True  100.0
3     that          False   20.0
4    other           True  100.0
5    other          False   25.0

Upvotes: 5

Related Questions