sdaza
sdaza

Reputation: 1052

How to get probabilities when using score batch in Pyspark with Feature Store?

I'm using from databricks.feature_engineering import FeatureEngineering functions for the future store.

I'd like to perform a score batch inference.

After logging a simple RF classifier:

fe = FeatureEngineeringClient()
fe.log_model(
       model=model,
       artifact_path=artifact_path,
       flavor=flavor,
       training_set=training_set,
       registered_model_name=model_name
)

I want to do score batch inference, but I need the probabilities, not just the predicted label.

The score_batch function only retrieves the predicted label. If changed the name of the prediction columns in the classifier, still, the prediction label expected by score_batch has to be double.

prediction_df = fe.score_batch(
  model_uri=uc_modeling.get_lastest_model_uri(), 
  df=batch_input_df)

Thanks!

Upvotes: 0

Views: 242

Answers (1)

JayashankarGS
JayashankarGS

Reputation: 8140

Giving it as an answer so that it will help community to find better solution.

Since, score batch doesn't support the predict_proba you load model using mlflow.pyfunc.load_model and predict probabilities.

Here, is the code.

import pandas as pd 
logged_model = path_to_model
loaded_model = mlflow.pyfunc.load_model(logged_model)
loaded_model.predict_proba(pd.DataFrame(data))

Above code loads the model as generic python function but you can also load it using model type.

Example:

mlflow.<model-type>.load_model(modelpath)

Refer more about this here.

Upvotes: 0

Related Questions