Reputation: 101
While the Scikit Learn documentation is fantastic, I couldn't find if there was a way to specify a custom error function to optimize in a classification problem.
Backing up a bit, I'm working on a text classification problem where False Positives are much better than False Negatives. This is because I am labeling the text as important to a user, and false positives at worst would waste a small amount of time for the user, whereas false negatives would cause some potentially important information to never be seen. Therefore I'd like to scale the False Negative errors up (or False Positive errors down, whichever) during optimization.
I understand that each algorithm optimizes a different error function, so there isn't a one-size-fits-all solution in terms of supplying a custom error function. But is there another way? For example, scaling the labels could work for an algorithm that treats labels as real values, but wouldn't work for SVM, for example, because SVM likely scales the labels to -1,+1 under the hood anyway.
Upvotes: 0
Views: 286
Reputation: 363838
Some estimators take a class_weight
constructor argument. Assuming that your classes are ["neg", "pos"]
, you can give the negative class an arbitrarily higher weight than the positive class, e.g.:
clf = LinearSVC(class_weight={"neg": 10, "pos": 1})
Then, when you're using GridSearchCV
to optimize the hyperparameters of the estimator, you should change the scorer
to one that favors false positives, such as a variant of Fᵦ with high β:
from sklearn.metrics import fbeta_score
def f3_scorer(estimator, X, y_true):
y_pred = estimator.predict(X)
return fbeta_score(y_true, y_pred, beta=3)
gs = GridSearchCV(clf, params, scoring=f3_scorer)
Upvotes: 1