Reputation: 91
I would like to understand how to apply inverse transformation in a pipeline, and not using the StandardScaler
function directly.
The code that I am using is the following:
import pandas as pd
import numpy as np
from sklearn.cluster import KMeans
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder, StandardScaler
categoric = X.select_dtypes(['object']).columns
numeric = X.select_dtypes(['int']).columns
tf = ColumnTransformer([('onehot', OneHotEncoder(), categoric),
('scaler', StandardScaler(), numeric)])
X_preprocessed = tf.fit_transform(X)
model = KMeans(n_clusters=2, random_state=24)
model.fit(X_preprocessed)
After getting the output of a given model (KMeans in this case), how can I get back the original scale of the numeric
values of any X dataframe?
I know StandardScaler
has a method (.inverse_transformation
) to do that, but my question arises in the use of a pipeline with ColumnTransformer
.
P.S.: The objective of doing so is to interpret the centroids of the model.
Upvotes: 8
Views: 5721
Reputation: 61
You might have already found a solution, but I had a similar issue. I am working with pandas and would like the ColumnTransformer to return a dataframe again. I do this by placing the column names back in order as they are used in the columntransformer, but I wanted to make sure it was correct so I wanted to inverse the transformation and check if it returned the original dataframe and thus hadn't mislabeled any columns.
There are 2 ways to access the sub-transformers inside your tf:
tf.transformers_[1][1] # second transformer, 2nd item being the actual class
tf.named_transformers_['scaler']
You can then call the inverse_transform for that particular sub-transformer. This only gives you the ability to do the inverse with one of the transformers so you'd have to then reconstruct your dataset by appending the results of both into 1 frame again.
Upvotes: 2