Reputation: 1419
I have a pandas dataframe as follows.
data = {'First Column Name': ['12.513362', '13.081390', '15.045193'],
'Second Column Name': ['24.597206', '25.526964', '29.153882'],
'3rd Column Name': ['nan', 'nan', 'nan'],
'4th Column Name': ['nan', '2.545', '3.89'],
}
df = pd.DataFrame (data, columns = ['First Column Name','Second Column Name','3rd Column Name', '4th Column Name'])
df
has three rows and 4 columns. Now I apply the following preprocessing with normalization.
fill_NaN = SimpleImputer(missing_values=np.nan, strategy='mean')
df = pd.DataFrame(fill_NaN.fit_transform(df))
normalizer = preprocessing.Normalizer().fit(df)
df=normalizer.transform(df)
I get an out NumPy array of 3 rows and 3 columns. One column is discarded which contains all nan
which is fine.
How can I preserve the original column names with this normalization?
Upvotes: 0
Views: 361
Reputation: 12614
SimpleImputer
is the one responsible for dropping the column here. You can detect which column (index) gets dropped with the attribute statistics_
: it will be np.nan
.
statistics_ : array of shape (n_features,)
...
During transform, features corresponding to
np.nan
statistics will be discarded.
https://scikit-learn.org/stable/modules/generated/sklearn.impute.SimpleImputer.html
Upvotes: 1