dzieciou
dzieciou

Reputation: 4514

Measuring classifier accuracy on word-level

I have a list of lists corresponding to sentences of words.

X = [
        ['John','has','house'],
        ['Mary','works','at','home']
    ]

You can think of each sentence as a training sample. My model, a tagger, tags each word with some label:

y = [
        ['proper','verb','noun'],
        ['proper','verb','prep','noun']
    ]

I would like to grid search my tagger against tagging accuracy on a word level:

search = GridSearchCV(tagger, parameters, cv=10, scoring='accuracy')
search.fit(X, y)

However, accuracy_score() method complains that

{ValueError}You appear to be using a legacy multi-label data representation. Sequence of sequences are no longer supported; use a binary array or sparse matrix instead - the MultiLabelBinarizer transformer can convert to this format.

This does not happen, when list of lists, y, is flattened, e.g.:

> y_pred = ['proper','verb','noun', 'proper','verb','prep','noun']
> y_true = ['proper','verb','noun', 'proper','verb','prep','noun']
> accuracy_score(y_pred, y_true)
1.0

I still want my tagger (Keras model) to predict() a list of list to preserve text structure (sentences, words), but I want scorer to evaluate on word-level.

How can I solve it in an elegant way?

Upvotes: 3

Views: 207

Answers (1)

dzieciou
dzieciou

Reputation: 4514

One possible solution I found is to build custom scoring function:

from sklearn.metrics import make_scorer, accuracy_score


def flatten(l):
    return [item for sublist in l for item in sublist ]

def word_accuracy_score(y, y_pred):
    y = flatten(y)
    y_pred = flatten(y_pred)
    return accuracy_score(y, y_pred)

and pass it to GridSearchCV:

scorer = make_scorer(word_accuracy_score)
search = GridSearchCV(tagger, parameters, cv=10, scoring=scorer)

Upvotes: 1

Related Questions