Reputation: 121
In FastText I want to change the balance between precision and recall. Can it be done?
Upvotes: 1
Views: 704
Reputation: 974
If you're referring to the python fasttext implementation than I'm afraid there is no built in simple method to do this, what you can do is look at the returned probabilities and call an AUC or ROC curve plot method of your choice with the probability lists, here is a code example that does just this for a binary classifier:
# label the data
labels, probabilities = fasttext_classifier.predict([re.sub('\n', ' ', sentence)
for sentence in test_sentences])
# convert fasttext multilabel results to a binary classifier (probability of TRUE)
labels = list(map(lambda x: x == ['__label__TRUE'], labels))
probabilities = [probability[0] if label else (1-probability[0])
for label, probability in zip(labels, probabilities)]
And then you are free to build your metrics using the common sklearn methods:
from sklearn.metrics import roc_curve
from sklearn.metrics import roc_auc_score
from sklearn.metrics import precision_recall_curve
from sklearn.metrics import f1_score
from sklearn.metrics import auc
from matplotlib import pyplot
auc = roc_auc_score(testy, probabilities)
print('ROC AUC=%.3f' % (auc))
# calculate roc curve
fpr, tpr, _ = roc_curve(testy, probabilities)
# plot the roc curve for the model
pyplot.plot(fpr, tpr, marker='.', label='ROC curve')
# axis labels
pyplot.xlabel('False Positive Rate (sensitivity)')
pyplot.ylabel('True Positive Rate (specificity)')
# show the legend
pyplot.legend()
# show the plot
pyplot.show()
precision_values, recall_values, _ = precision_recall_curve(testy, probabilities)
f1 = f1_score(testy, labels)
# summarize scores
print('f1=%.3f auc=%.3f' % (f1, auc))
# plot the precision-recall curves
pyplot.plot(recall_values, precision_values, marker='.', label='Precision,Recall')
# axis labels
pyplot.xlabel('Recall')
pyplot.ylabel('Precision')
# show the legend
pyplot.legend()
# show the plot
pyplot.show()
The command line fasttext version has a threshold parameter and you can perform multiple runs with different thresholds but this is needlessly time consuming.
Upvotes: 1