Reputation: 835
I've got a Dataframe that looks like this:
cat val
0 1 10
1 1 4
2 2 6
3 2 2
4 1 8
5 2 12
Where cat
is category, and val
is value. I would like to create a column, called scaled
, that is linearly scaled/normalized to 0-1, on a per-category basis.
I know how to do the former - ((val - min) / (max - min))
- at the column level, and I also know how to perform operations on a per-category basis, I just don't know how to combine the two. The desired result is:
cat val scaled
0 1 10 1
1 1 4 0
2 2 6 0.4
3 2 2 0
4 1 8 0.667
5 2 12 1
Ideally I'd like to stick to using Pandas only.
Any help would be appreciated, thank you!
Upvotes: 2
Views: 1089
Reputation: 512
You can use the following lines of code to do the scaling based on another column
import pandas as pd
df = pd.DataFrame({'Group': [1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3], 'Values': [1, 4, -2, 7, 3, 4, 1, -5, 12, 4, 10, 2, 6, 20, 15]})
# Normalize around mean
df['mean_normal'] = df.groupby('Group').transform(lambda x: (x - x.mean()/ x.std()))
# Normalize between 0 and 1
df['min_max_normal'] = df.groupby('Group').transform(lambda x: ((x - x.min())/ (x.max() - x.min())))
Upvotes: 0
Reputation: 59529
Your scaling is to subtract the min and divide by the range, so use groupby
+ transform
to broadcast those properties back to every row for that group and do the math.
import numpy as np
gp = df.groupby('cat')['val']
df['scaled'] = (df['val'] - gp.transform(min))/gp.transform(np.ptp)
cat val scaled
0 1 10 1.000000
1 1 4 0.000000
2 2 6 0.400000
3 2 2 0.000000
4 1 8 0.666667
5 2 12 1.000000
For aggregations that reduce to a scalar, groupby
+ agg/apply
reduces to a single row per group; however groupby
+ transform
returns a like-Indexed Series so that it aligns to the original DataFrame.
gp.min()
#cat
#1 4
#2 2
#Name: val, dtype: int64
gp.transform(min)
#0 4
#1 4
#2 2
#3 2
#4 4
#5 2
#Name: val, dtype: int64
Upvotes: 2