Pandas dataframe scale column based on another column

Question

I've got a Dataframe that looks like this:

    cat   val
0     1    10
1     1     4
2     2     6
3     2     2
4     1     8
5     2    12

Where cat is category, and val is value. I would like to create a column, called scaled, that is linearly scaled/normalized to 0-1, on a per-category basis. I know how to do the former - ((val - min) / (max - min)) - at the column level, and I also know how to perform operations on a per-category basis, I just don't know how to combine the two. The desired result is:

    cat   val  scaled
0     1    10       1  
1     1     4       0
2     2     6     0.4
3     2     2       0
4     1     8   0.667
5     2    12       1

Ideally I'd like to stick to using Pandas only.

Any help would be appreciated, thank you!

ALollz · Accepted Answer

Your scaling is to subtract the min and divide by the range, so use groupby + transform to broadcast those properties back to every row for that group and do the math.

import numpy as np

gp = df.groupby('cat')['val']

df['scaled'] = (df['val'] - gp.transform(min))/gp.transform(np.ptp)

   cat  val    scaled
0    1   10  1.000000
1    1    4  0.000000
2    2    6  0.400000
3    2    2  0.000000
4    1    8  0.666667
5    2   12  1.000000

For aggregations that reduce to a scalar, groupby + agg/apply reduces to a single row per group; however groupby + transform returns a like-Indexed Series so that it aligns to the original DataFrame.

gp.min()
#cat
#1    4
#2    2
#Name: val, dtype: int64

gp.transform(min)
#0    4
#1    4
#2    2
#3    2
#4    4
#5    2
#Name: val, dtype: int64

Pandas dataframe scale column based on another column

Answers (2)

Related Questions