jd.
jd.

Reputation: 10938

Calculating a score from multiple classifiers

I'm trying to determine the similarity between pairs of items taken among a large collection. The items have several attributes and I'm able to calculate a discrete similarity score for each attribute, between 0 and 1. I use various classifiers depending on the attribute: TF-IDF cosine similarity, Naive Bayes Classifier, etc.

I'm stuck when it comes to compiling all that information into a final similarity score for all the items. I can't just take an unweighted average because 1) what's a high score depends on the classifier and 2) some classifiers are more important than others. In addition, some classifiers should be considered only for their high scores, i.e. a high score points to a higher similarity but lower scores have no meaning.

So far I've calculated the final score with guesswork but the increasing number of classifiers makes this a very poor solution. What techniques are there to determine an optimal formula that will take my various scores and return just one? It's important to note that the system does receive human feedback, which is how some of the classifiers work to begin with.

Ultimately I'm only interested in ranking, for each item, the ones that are most similar. The absolute scores themselves are meaningless, only their ordering is important.

Upvotes: 3

Views: 954

Answers (2)

soufanom
soufanom

Reputation: 396

There is a great book on the topic of ensemble classifier. It is online on: Combining Pattern Classifiers

There are two chapters (ch4 & ch5) in this book on Fusion of Label Outputs and how to get a single decision value.

A set of methods are defined in the chapter including:

1- Weighted Majority Vote

2- Naive Bayes Combination

3- ...

I hope that this is what you were looking for.

Upvotes: 5

Has QUIT--Anony-Mousse
Has QUIT--Anony-Mousse

Reputation: 77454

Get a book on ensemble classification. There has been a lot of work on how to learn a good combination of classifiers. There are numerous choices. You can of course learn weights and do a weighted average. Or you can use error correcting codes. etc. pp.

Anyway, read up on "ensemble classification", that is the keyword you need.

Upvotes: 3

Related Questions