Reputation: 1681
I trained a Logistic Regression model with PySpark MLlib built-in class LogisticRegression
. However, when it was trained, it couldn't be used to predict other dataframes because AttributeError: 'LogisticRegression' object has no attribute 'predictProbability'
OR AttributeError: 'LogisticRegression' object has no attribute 'predict'
.
from pyspark.ml.classification import LogisticRegression
model = LogisticRegression(regParam=0.5, elasticNetParam=1.0)
# define the input feature & output column
model.setFeaturesCol('features')
model.setLabelCol('WinA')
model.fit(df_train)
model.setPredictionCol('WinA')
model.predictProbability(df_val['features'])
model.predict(df_val['features'])
AttributeError: 'LogisticRegression' object has no attribute 'predictProbability'
Properties:
PySpark version:
>>import pyspark
>>pyspark.__version__
3.1.2
JDK version:
>>!java -version
openjdk version "11.0.11" 2021-04-20
OpenJDK Runtime Environment (build 11.0.11+9-Ubuntu-0ubuntu2.18.04)
OpenJDK 64-Bit Server VM (build 11.0.11+9-Ubuntu-0ubuntu2.18.04, mixed mode, sharing)
Environment: Google Colab
Upvotes: 0
Views: 1372
Reputation: 2939
Your code here
model.fit(df_train)
did not actually give you a trained model since the type of variable model
is still pyspark.ml.classification.LogisticRegression
class
type(model)
# pyspark.ml.classification.LogisticRegression
So, you should catch the returned object by assigning it to a variable or overwriting your model
variable, then it will give you the trained logistic regression model of pyspark.ml.classification.LogisticRegressionModel
class
model = model.fit(df_train)
type(model)
# pyspark.ml.classification.LogisticRegressionModel
Finally, .predict
and .predictProbability
methods need an argument of a pyspark.ml.linalg.DenseVector
object. So, I think you want to use .transform
instead since it will add predicted label and probability as columns to the input dataframe. It would be like this
predicted_df = model.transform(df_val)
Upvotes: 1