siva jothi
siva jothi

Reputation: 11

How can I do HyperParameter Tuning for PMML model in python?

I am creating a RandomForest PMML model using the following code in Python

from sklearn2pmml.pipeline import PMMLPipeline
from sklearn2pmml import sklearn2pmml
rf=RandomForestClassifier()
rf = PMMLPipeline([('random',rf)])
rf.fit(X_train, y_train)
sklearn2pmml(rf, "classification pmml file/random.pmml",with_repr=True)

and I am loading the saved RandomForest Model using the following code in Python

from pypmml import Model
rf = Model.fromFile('classification pmml file/random.pmml')

How can I do HyperParameter Tuning for this RandomForest PMML model in Python?

Upvotes: 1

Views: 147

Answers (1)

user1808924
user1808924

Reputation: 4926

You can do hyperparameter tuning as usual; there is no need to do anything special if the resulting tuned pipeline in converted to PMML representation using the SkLearn2PMML package.

In brief, if you're only tuning one estimator, then simply wrap it into GridSearchCV in place. For example:

pipeline = PMMLPipeline([
  ("tuned-rf", GridSearchCV(RandomForestClassifier(..), param_grid = {..}))
])
pipeline.fit(X, y)

If you're tuning multiple estimators, then you can treat GridSearchCV as a top-level workflow engine, and wrap the whole pipeline into it. The tuned pipeline can be obtained as the GridSearchCV.best_estimator_ attribute afterwards:

pipeline = PMMLPipeline([
  ("rf", RandomForestClassifier(..))
])
gridsearch = GridSearchCV(pipeline, param_gird = {..})
gridsearch.fit(X, y)
pipeline = gridsearch.best_estimator_

For more details, see the following technical article: https://openscoring.io/blog/2019/12/25/converting_sklearn_gridsearchcv_pipeline_pmml/

Upvotes: 1

Related Questions