user3245256
user3245256

Reputation: 1948

Spark ML Logistic Regression in Python: Set the model threshold to maximize F-Measure

I've trained a logistic regression in Spark using pipeline. It ran and I am looking at model diagnostics.

I created my model summary (lr_summary = lrModel.stages[-1].summary). After that I pretty much copied the code from this webpage. It all works until I try to determine the best threshold based on F-measure using this example Python code:

# Set the model threshold to maximize F-Measure
fMeasure = lr_summary.fMeasureByThreshold
maxFMeasure = fMeasure.groupBy().max('F-Measure').select('max(F-Measure)').head()
bestThreshold = fMeasure.where(fMeasure['F-Measure'] == maxFMeasure['max(F-Measure)']).select('threshold').head()['threshold']
lr.setThreshold(bestThreshold)

Unfortunately, I am getting an error in line 3 (bestThreshold = ): TypeError: 'NoneType' object has no attribute 'getitem'

Any advice?

Thank you so much!

Upvotes: 0

Views: 451

Answers (1)

Alper t. Turker
Alper t. Turker

Reputation: 35229

I cannot reproduce this problem, but it is possible that model doesn't have summary (in that case I would expect attribute error in maxFMeasure = ... line). You can check if model has one:

lrModel.stages[-1].hasSummary

Also you can make this code much simpler:

bestThreshold = fMeasure.orderBy(fMeasure['F-Measure'].desc()).first().threshold

Upvotes: 1

Related Questions