Reputation: 1948
I've trained a logistic regression in Spark using pipeline. It ran and I am looking at model diagnostics.
I created my model summary (lr_summary = lrModel.stages[-1].summary). After that I pretty much copied the code from this webpage. It all works until I try to determine the best threshold based on F-measure using this example Python code:
# Set the model threshold to maximize F-Measure
fMeasure = lr_summary.fMeasureByThreshold
maxFMeasure = fMeasure.groupBy().max('F-Measure').select('max(F-Measure)').head()
bestThreshold = fMeasure.where(fMeasure['F-Measure'] == maxFMeasure['max(F-Measure)']).select('threshold').head()['threshold']
lr.setThreshold(bestThreshold)
Unfortunately, I am getting an error in line 3 (bestThreshold = ): TypeError: 'NoneType' object has no attribute 'getitem'
Any advice?
Thank you so much!
Upvotes: 0
Views: 451
Reputation: 35229
I cannot reproduce this problem, but it is possible that model doesn't have summary (in that case I would expect attribute error in maxFMeasure = ...
line). You can check if model has one:
lrModel.stages[-1].hasSummary
Also you can make this code much simpler:
bestThreshold = fMeasure.orderBy(fMeasure['F-Measure'].desc()).first().threshold
Upvotes: 1