Reputation: 1332
I would like to normalize below dataset for each group according to formula of
(x-min(x))/(max(x)-min(x))
for each group. How can I do that in pandas dataframe? I need normalization for price and size both? Thank you.
data = [['Group 1',10,100],
['Group 1',20,80],
['Group 1',15,60],
['Group 1',10,120],
['Group 2',10,120],
['Group 2',20,130],
['Group 2',30,200],
['Group 2',40,250],
['Group 2',50,300]]
df = pd.DataFrame(data, columns = ['Group','price','size'])
Upvotes: 3
Views: 4052
Reputation: 862661
Use GroupBy.apply
with custom function:
cols = ['price','size']
df[cols] = df.groupby('Group')[cols].apply(lambda x: (x-x.min())/(x.max()-x.min()))
print (df)
Group price size
0 Group 1 0.00 0.666667
1 Group 1 1.00 0.333333
2 Group 1 0.50 0.000000
3 Group 1 0.00 1.000000
4 Group 2 0.00 0.000000
5 Group 2 0.25 0.055556
6 Group 2 0.50 0.444444
7 Group 2 0.75 0.722222
8 Group 2 1.00 1.000000
cols = ['price','size']
g = df.groupby('Group')[cols]
min1 = g.transform('min')
max1 = g.transform('max')
df1 = df.join(df[cols].sub(min1).div(max1 - min1).add_suffix('_norm'))
print (df1)
Group price size price_norm size_norm
0 Group 1 10 100 0.00 0.666667
1 Group 1 20 80 1.00 0.333333
2 Group 1 15 60 0.50 0.000000
3 Group 1 10 120 0.00 1.000000
4 Group 2 10 120 0.00 0.000000
5 Group 2 20 130 0.25 0.055556
6 Group 2 30 200 0.50 0.444444
7 Group 2 40 250 0.75 0.722222
8 Group 2 50 300 1.00 1.000000
Upvotes: 2
Reputation: 4792
df[['normalized_price', 'normalized_size']]= df.groupby('Group').transform(lambda x: (x - x.min())/ (x.max() - x.min()))
df
Group price size normalized_price normalized_size
0 Group 1 10 100 0.00 0.666667
1 Group 1 20 80 1.00 0.333333
2 Group 1 15 60 0.50 0.000000
3 Group 1 10 120 0.00 1.000000
4 Group 2 10 120 0.00 0.000000
5 Group 2 20 130 0.25 0.055556
6 Group 2 30 200 0.50 0.444444
7 Group 2 40 250 0.75 0.722222
8 Group 2 50 300 1.00 1.000000
Upvotes: 3