Stupid420
Stupid420

Reputation: 1419

preserving column names in sklearn preprocessing with normalizer

I have a pandas dataframe as follows.

data = {'First Column Name':  ['12.513362', '13.081390', '15.045193'],
        'Second Column Name': ['24.597206', '25.526964', '29.153882'],
        '3rd Column Name':  ['nan', 'nan', 'nan'],
        '4th Column Name':  ['nan', '2.545', '3.89'],
        }

df = pd.DataFrame (data, columns = ['First Column Name','Second Column Name','3rd Column Name', '4th Column Name'])

df has three rows and 4 columns. Now I apply the following preprocessing with normalization.

fill_NaN = SimpleImputer(missing_values=np.nan, strategy='mean')
df = pd.DataFrame(fill_NaN.fit_transform(df))
normalizer = preprocessing.Normalizer().fit(df)
df=normalizer.transform(df) 

I get an out NumPy array of 3 rows and 3 columns. One column is discarded which contains all nan which is fine.

How can I preserve the original column names with this normalization?

Upvotes: 0

Views: 361

Answers (1)

Ben Reiniger
Ben Reiniger

Reputation: 12614

SimpleImputer is the one responsible for dropping the column here. You can detect which column (index) gets dropped with the attribute statistics_: it will be np.nan.

statistics_ : array of shape (n_features,)

...

During transform, features corresponding to np.nan statistics will be discarded.

https://scikit-learn.org/stable/modules/generated/sklearn.impute.SimpleImputer.html

Upvotes: 1

Related Questions