user2979931
user2979931

Reputation: 101

Sci-kit learn: applying custom error function to favor False Positives?

While the Scikit Learn documentation is fantastic, I couldn't find if there was a way to specify a custom error function to optimize in a classification problem.

Backing up a bit, I'm working on a text classification problem where False Positives are much better than False Negatives. This is because I am labeling the text as important to a user, and false positives at worst would waste a small amount of time for the user, whereas false negatives would cause some potentially important information to never be seen. Therefore I'd like to scale the False Negative errors up (or False Positive errors down, whichever) during optimization.

I understand that each algorithm optimizes a different error function, so there isn't a one-size-fits-all solution in terms of supplying a custom error function. But is there another way? For example, scaling the labels could work for an algorithm that treats labels as real values, but wouldn't work for SVM, for example, because SVM likely scales the labels to -1,+1 under the hood anyway.

Upvotes: 0

Views: 286

Answers (1)

Fred Foo
Fred Foo

Reputation: 363838

Some estimators take a class_weight constructor argument. Assuming that your classes are ["neg", "pos"], you can give the negative class an arbitrarily higher weight than the positive class, e.g.:

clf = LinearSVC(class_weight={"neg": 10, "pos": 1})

Then, when you're using GridSearchCV to optimize the hyperparameters of the estimator, you should change the scorer to one that favors false positives, such as a variant of Fᵦ with high β:

from sklearn.metrics import fbeta_score

def f3_scorer(estimator, X, y_true):
    y_pred = estimator.predict(X)
    return fbeta_score(y_true, y_pred, beta=3)

gs = GridSearchCV(clf, params, scoring=f3_scorer)

Upvotes: 1

Related Questions