Vladimir Sazonov
Vladimir Sazonov

Reputation: 139

How can I use pmml model in PySpark script?

I have xgboost model, which was trained on pure Python and converted to pmml format. Now I need to use this model in PySpark script, but I out of ideas, how can I realize it. Are there methods that allow import pmml model in Python and use it for predict? Thanks for any suggestions.

BR,
Vladimir

Upvotes: 1

Views: 1658

Answers (2)

PredictFuture
PredictFuture

Reputation: 226

You could use PyPMML-Spark to import PMML in PySpark script, for example:

from pypmml_spark import ScoreModel

model = ScoreModel.fromFile('the/pmml/file/path')
score_df = model.transform(df)

Upvotes: 1

Assaf Mendelson
Assaf Mendelson

Reputation: 13001

Spark does not support importing from PMML directly. While I have not encountered a pyspark PMML importer there is one for java (https://github.com/jpmml/jpmml-evaluator-spark). What you can do is wrap the java (or scala) so you can access it from python (e.g. see http://aseigneurin.github.io/2016/09/01/spark-calling-scala-code-from-pyspark.html).

Upvotes: 3

Related Questions