Ghost
Ghost

Reputation: 1523

Fit a Normalizer with an array, then transform another in python with sklearn

I'm not sure if i'm doing something wrong, or if this is not the correct way to do this..

I'm encoding variables in a dataset for a model, now, i'm using a Normalizer() from sklearn.preprocessing to normalize one of my variables which is numerical.

My dataset is split in two, one for the training and one for the inference. Now, my goal is to normalize this numerical variable (let's call it column x) in the training subset, and then use the normalization parameters to normalize the same variable in the inference dataset. Now, both subsets don't have the same amount of entries, so, what i'm doing is:

nr = Normalizer()
nr.fit([df1.x])
new_col = nr.transform(df1.x)

Now, the problme is.. when i try to use the same normalizer parameters on the column x in the inference subset, since it has a different number of rows:

new_col1 = nr.transform(df2.x)

I get:

X has 10 features, but Normalizer is expecting 697 features as input.

I'm not sure if it's some reshape problem or if the Normalizer() shouldn't be used in that way, so, any advice would be more than welcome.

Upvotes: 1

Views: 578

Answers (1)

Antoine Dubuis
Antoine Dubuis

Reputation: 5324

Normalizer is used to normalize rows whereas StandardScaler is used to normalize column. Concerning your questions, it seems that you want to scale columns. Therefore you should use StandardScaler.

scikit-learn transformers excepts 2D array as input of shape (n_sample, n_feature) but pandas.Series are one-dimensional ndarray with axis labels.

You can fix that by passing a pandas.DataFrame to the transformer.

As follows:

import numpy as np
import pandas as pd
from sklearn.preprocessing import StandardScaler

df1 = pd.DataFrame({'x' : np.random.uniform(low=0, high=10, size=1000)})
df2 = pd.DataFrame({'x' : np.random.uniform(low=0, high=10, size=850)})

scaler = StandardScaler()
new_col = scaler.fit_transform(df1[['x']])
new_col1 = scaler.transform(df2[['x']])

Upvotes: 1

Related Questions