X_train, y_train from transformed data

Question

How do i obtain X_train and y_train separately after transforming the data

Code

from sklearn.pipeline import Pipeline 
from sklearn.model_selection import train_test_split
import pandas as pd
from sklearn.preprocessing import StandardScaler 


DATA=pd.read_csv("/storage/emulated/0/Download/iris-write-from-docker.csv")

X = DATA.drop(["class"], axis = 'columns')
y = DATA["class"].values
        
X_train, X_test, y_train, y_test=train_test_split(X,y,test_size=0.25,random_state = 42)
                                 
pipe=Pipeline(steps=[('clf',StandardScaler())])
dta=pipe.fit_transform(X_train,y_train)

print(dta)

#print(X_train,y_train) from dta

I want to obtain transformed X_train and y_train from dta

Antoine Dubuis · Accepted Answer

The output of fit_transform() is the transformed version of X_train. y_train is not used during the fit_transform() of your pipeline.

Therefore you can simply do as follows to retrieve the transformed X_train as y_train remains the same:

pipe=Pipeline(steps=[('clf',StandardScaler())])
X_train_scaled = pipe.fit_transform(X_train)

X_train, y_train from transformed data

Answers (1)

Related Questions