Reputation: 1356
I am writing a function where the best model is chosen over a k-fold cross validation. Inside the function, I have a pipeline that
Then I want to use the model to predict some target values. To do so, I have to apply the same scaling that has been applied during the grid search.
Does the pipeline transform the data for which I want to predict the target using the same fit for the train data, even though I do not specify it? I've been looking in the documentation and from here seems that it does it, but I'm not sure at all since it's the first time I use pipelines.
def build_model(data, target, param_grid):
# compute feature range
features = df.keys()
feature_range = dict()
maxs = df.max(axis=0)
mins = df.min(axis=0)
for feature in features:
if feature is not 'metric':
feature_range[feature] = {'max': maxs[feature], 'min': mins[feature]}
# initialise the k-fold cross validator
no_split = 10
kf = KFold(n_splits=no_split, shuffle=True, random_state=42)
# create the pipeline
pipe = make_pipeline(MinMaxScaler(),
GridSearchCV(
estimator=DecisionTreeRegressor(),
param_grid=param_grid,
n_jobs=-1,
cv=kf,
refit=True))
pipe.fit(data, target)
return pipe, feature_range
max_depth = np.arange(1,10)
min_samples_split = np.arange(2,10)
min_samples_leaf = np.arange(2,10)
param_grid = {'max_depth': max_depth,
'min_samples_split': min_samples_split,
'min_samples_leaf': min_samples_leaf}
pipe, feature_range = build_model(data=data, target=target, param_grid=param_grid)
# could that be correct?
pipe.fit(test_data)
EDIT: I found in the documentation for the [preprocessing] that each preprocessing tool has an API that
compute the [transformation] on a training set so as to be able reapply the same transformation on the testing set
If the case, it may save internally the transformation and therefore the answer may be positive.
Upvotes: 4
Views: 3780