Reputation: 11
I am creating a RandomForest PMML model using the following code in Python
from sklearn2pmml.pipeline import PMMLPipeline
from sklearn2pmml import sklearn2pmml
rf=RandomForestClassifier()
rf = PMMLPipeline([('random',rf)])
rf.fit(X_train, y_train)
sklearn2pmml(rf, "classification pmml file/random.pmml",with_repr=True)
and I am loading the saved RandomForest Model using the following code in Python
from pypmml import Model
rf = Model.fromFile('classification pmml file/random.pmml')
How can I do HyperParameter Tuning for this RandomForest PMML model in Python?
Upvotes: 1
Views: 147
Reputation: 4926
You can do hyperparameter tuning as usual; there is no need to do anything special if the resulting tuned pipeline in converted to PMML representation using the SkLearn2PMML package.
In brief, if you're only tuning one estimator, then simply wrap it into GridSearchCV
in place. For example:
pipeline = PMMLPipeline([
("tuned-rf", GridSearchCV(RandomForestClassifier(..), param_grid = {..}))
])
pipeline.fit(X, y)
If you're tuning multiple estimators, then you can treat GridSearchCV
as a top-level workflow engine, and wrap the whole pipeline into it. The tuned pipeline can be obtained as the GridSearchCV.best_estimator_
attribute afterwards:
pipeline = PMMLPipeline([
("rf", RandomForestClassifier(..))
])
gridsearch = GridSearchCV(pipeline, param_gird = {..})
gridsearch.fit(X, y)
pipeline = gridsearch.best_estimator_
For more details, see the following technical article: https://openscoring.io/blog/2019/12/25/converting_sklearn_gridsearchcv_pipeline_pmml/
Upvotes: 1