Reputation: 315
Let's say i have a pandas data frame, and i want to normalize only some attributes, but not the whole data frame with the help of this function:
preprocessing.normalize
And i want to inplace these normalized columns to my data frame.But i can't because it has different format(numpy array).
I have already seen how to do the normalization other ways, for example i did like this:
s0 = X.iloc[:,13:15]
X.iloc[:,13:15] = (s0 - s0.mean()) / (s0.max() - s0.min())
X.head()
But i really need to do it using sklearn. Thanks, Stack!
Upvotes: 0
Views: 5080
Reputation: 1072
(s0 - s0.mean()) / (s0.max() - s0.min()) is called Mean normalization and as far as I am aware, there is no transformer in Scikit-learn to carry out this transformation.
The MinMaxScaler transforms following this formula: (s0 - s0.min()) / (s0.max() - s0.min())
You can do this transformation on selected variables with scikit-learn as follows:
dirty way:
scaler = MinMaxScaler() # or any other scaler from sklearn
scaler.fit(X[[var1, var2, var20]])
X_transf[[var1, var2, var20]] = scaler.transform(X[[var1, var2, var20]])
better way using the ColumnTransfomer:
features_numerical = [var1, var2, var20]
numeric_transformer = Pipeline(steps=[('scaler', StandardScaler())])
preprocessor = ColumnTransformer(
transformers=[('numerical', numeric_transformer, features_numerical)], remainder='passthrough'}) # to keep all other features in the data set
preprocessor.fit_transform(X)
The returned variable is a numpy array, so needs re-casting into pandas dataframe and addition of variable names.
More information on how to use column transformer from sklearn here.
You need to import the ColumnTransformer and the Pipeline from sklearn, as well as the scaler of choice.
Upvotes: 1
Reputation: 36599
What you are doing is Min-max scaling. "normalize"
in scikit has different meaning then what you want to do.
Try MinMaxScaler.
And most of the sklearn transformers output the numpy arrays only. For dataframe, you can simply re-assign the columns to the dataframe like below example:
import pandas as pd
import numpy as np
df = pd.DataFrame(data=np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]), columns=['A', 'B', 'C'])
Now lets say you only want to min-max scale the columns A and C:
from sklearn.preprocessing import MinMaxScaler
minmax = MinMaxScaler()
df[['A', 'C']] = minmax.fit_transform(df[['A', 'C']])
Upvotes: 3