Reputation: 749
I'm trying to normalize a single feature to [0, 1], but the result I'm getting back is all float values of 1 and is clearly wrong.
import pandas as pd
import numpy as np
from sklearn.preprocessing import normalize
test = pd.DataFrame(data=[7, 6, 5, 2, 9, 9, 7, 8, 6, 5], columns=['data'])
normalize(test['data'].values.reshape(-1, 1))
This produces the following output:
array([[1.],
[1.],
[1.],
[1.],
[1.],
[1.],
[1.],
[1.],
[1.],
[1.]])
I thought this might be an int to float datatype issue so I tried casting to float first, normalize(test['data'].astype(float).values.reshape(-1, 1))
, but this gives the same result. What am I missing?
Upvotes: 3
Views: 2580
Reputation: 323226
I feel like we can use
(test.data-test.data.min())/np.ptp(test.data.values)
Out[136]:
0 0.714286
1 0.571429
2 0.428571
3 0.000000
4 1.000000
5 1.000000
6 0.714286
7 0.857143
8 0.571429
9 0.428571
Name: data, dtype: float64
Upvotes: 2
Reputation: 29732
This is because the default axis
is 1.
Set axis = 0
:
normalize(test['data'].values.reshape(-1, 1), axis=0)
Output:
array([[0.32998316],
[0.28284271],
[0.23570226],
[0.0942809 ],
[0.42426407],
[0.42426407],
[0.32998316],
[0.37712362],
[0.28284271],
[0.23570226]])
Upvotes: 7