Pyspark trained Logistic Regression model doesn't predict() and predictProbability() function

Question

I trained a Logistic Regression model with PySpark MLlib built-in class LogisticRegression. However, when it was trained, it couldn't be used to predict other dataframes because AttributeError: 'LogisticRegression' object has no attribute 'predictProbability' OR AttributeError: 'LogisticRegression' object has no attribute 'predict'.

from pyspark.ml.classification import LogisticRegression
model = LogisticRegression(regParam=0.5, elasticNetParam=1.0)

# define the input feature & output column
model.setFeaturesCol('features')
model.setLabelCol('WinA')

model.fit(df_train)

model.setPredictionCol('WinA')
model.predictProbability(df_val['features'])
model.predict(df_val['features'])

AttributeError: 'LogisticRegression' object has no attribute 'predictProbability'

Properties:

PySpark version:

>>import pyspark
>>pyspark.__version__
3.1.2

JDK version:

>>!java -version
openjdk version "11.0.11" 2021-04-20
OpenJDK Runtime Environment (build 11.0.11+9-Ubuntu-0ubuntu2.18.04)
OpenJDK 64-Bit Server VM (build 11.0.11+9-Ubuntu-0ubuntu2.18.04, mixed mode, sharing)

Environment: Google Colab

AdibP · Accepted Answer

Your code here

model.fit(df_train)

did not actually give you a trained model since the type of variable model is still pyspark.ml.classification.LogisticRegression class

type(model)

# pyspark.ml.classification.LogisticRegression

So, you should catch the returned object by assigning it to a variable or overwriting your model variable, then it will give you the trained logistic regression model of pyspark.ml.classification.LogisticRegressionModel class

model = model.fit(df_train)
type(model)

# pyspark.ml.classification.LogisticRegressionModel

Finally, .predict and .predictProbability methods need an argument of a pyspark.ml.linalg.DenseVector object. So, I think you want to use .transform instead since it will add predicted label and probability as columns to the input dataframe. It would be like this

predicted_df = model.transform(df_val)

Pyspark trained Logistic Regression model doesn't predict() and predictProbability() function

Answers (1)

Related Questions

Pyspark trained Logistic Regression model doesn&#39;t predict() and predictProbability() function

Answers (1)

Related Questions

Pyspark trained Logistic Regression model doesn't predict() and predictProbability() function