dodo4545
dodo4545

Reputation: 315

Normalization sklearn

Let's say i have a pandas data frame, and i want to normalize only some attributes, but not the whole data frame with the help of this function:

preprocessing.normalize

And i want to inplace these normalized columns to my data frame.But i can't because it has different format(numpy array).

I have already seen how to do the normalization other ways, for example i did like this:

s0 = X.iloc[:,13:15] 
X.iloc[:,13:15] = (s0 - s0.mean()) / (s0.max() - s0.min())
X.head()

But i really need to do it using sklearn. Thanks, Stack!

Upvotes: 0

Views: 5080

Answers (2)

Sole Galli
Sole Galli

Reputation: 1072

(s0 - s0.mean()) / (s0.max() - s0.min()) is called Mean normalization and as far as I am aware, there is no transformer in Scikit-learn to carry out this transformation.

The MinMaxScaler transforms following this formula: (s0 - s0.min()) / (s0.max() - s0.min())

You can do this transformation on selected variables with scikit-learn as follows:

dirty way:

scaler = MinMaxScaler() # or any other scaler from sklearn
scaler.fit(X[[var1, var2, var20]])
X_transf[[var1, var2, var20]] = scaler.transform(X[[var1, var2, var20]])

better way using the ColumnTransfomer:

features_numerical = [var1, var2, var20]
numeric_transformer = Pipeline(steps=[('scaler', StandardScaler())])
preprocessor = ColumnTransformer(
    transformers=[('numerical', numeric_transformer, features_numerical)], remainder='passthrough'}) # to keep all other features in the data set
preprocessor.fit_transform(X)

The returned variable is a numpy array, so needs re-casting into pandas dataframe and addition of variable names.

More information on how to use column transformer from sklearn here.

You need to import the ColumnTransformer and the Pipeline from sklearn, as well as the scaler of choice.

Upvotes: 1

Vivek Kumar
Vivek Kumar

Reputation: 36599

What you are doing is Min-max scaling. "normalize" in scikit has different meaning then what you want to do.

Try MinMaxScaler.

And most of the sklearn transformers output the numpy arrays only. For dataframe, you can simply re-assign the columns to the dataframe like below example:

import pandas as pd
import numpy as np

df = pd.DataFrame(data=np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]), columns=['A', 'B', 'C'])

Now lets say you only want to min-max scale the columns A and C:

from sklearn.preprocessing import MinMaxScaler
minmax = MinMaxScaler()
df[['A', 'C']] = minmax.fit_transform(df[['A', 'C']])

Upvotes: 3

Related Questions