How to compute the success ratio in a classification task with multi-label assignments

There are N different classes that can be observed in my problem and my task is to detect which ones occurred at time t (of T frames). I created actualLabels and predictedLabels binary matrices of size NxT. I observed the data and filled actualLabels by hand. actualLabels(n,t) is 1 if the instance at time t involves nth class, otherwise it is 0. This serves as my ground truth data. Then, I run my algorithm on the data and predict the observed classes. The labels are found automatically and stored in predictedLabels.

My question is that how can I compute a success value using these matrices? Is there a popular way to do this?

Example case: Let there be 4 classes and T=5. Let the data be

actualLabels    = 0 0 0 0 1
                  1 1 0 1 0
                  0 1 0 0 1
                  0 0 0 0 1

predictedLabels = 0 0 0 0 1
                  0 0 1 1 0
                  0 1 0 0 0
                  0 1 0 0 0

It seems to be not possible to compute a conventional confusion matrix from multi-class assignment. Instead I computed a distance in each pair. Since I have binary vectors to compare, Hamming distance seems to be nice (similar to edit distance). The problem now is that I can report the distances between predicted and actual label vectors, but not the success percentage.

A confusion matrix conveys lots of information. I would like to see a similar table that helps me to see where the mistakes occur a lot, the overall success, etc.

Details: I have some wav data and I want to do polyphonic pitch tracking. At each time bin, there can be any number of notes played together which forms the labels I want to predict.

Note: There are some metrics for multi-label classification in Wikipedia. I would be happy to learn any other metric or plot.

Upvotes: 0

Answers (3)

Junier

Reputation: 1622

Similar to what @phs answered, you should consider the Hamming Distances of the predictions to the actual labels, however there's no need to count the number of predictions that fall under a given threshold (unless you really believe that predictions under the said threshold are prefect and those above it are garbage). If instead you believe that in a more "smooth" loss, i.e a prediction that is a hamming distance of 0 away from the actual value is better than a prediction with hamming distance with 1, which is better than a prediction with hamming distance with 2, etc. Then a good loss function is simply the average hamming distance of your predictions to the actual value. Another possibility to take the the exponential of the hamming distances if you believe that predictions that were relatively close are much better than those that where relatively not close (in non-linearly fashion).

As far as a confusion matrix goes, it seems like you're really doing like 88 classifications at once per frame, where each classifier_i is classifying whether note_i is playing. Thus, you can look the confusion matrix for each note, or pair of notes etc. As you mentioned looking at a confusion matrix involving 2^88 classes would be impossible. Another possibility is to do clustering on the labels, then do a confusion matrix using the cluster of the predictions vs the actual cluster.

Upvotes: 1

phs

Reputation: 11061

To measure success, you need to define it. Choose an error tolerance you are willing to accept (perhaps zero), and count how many predictions (have Hamming distances that) fall below it to get your percentage.

If your training matrices are sparse (mostly zeros), this may be a misleading measure since a model that always predicts the zero matrix will do well. Here you may want to look at precision and recall. These form a natural tradeoff and so it's usually not possible to maximize both simultaneously. To combine them into a single metric, consider the f-score. Again, if your training data is not sparse, then the simple accuracy percentage is probably best.

Finally, if you are measuring accuracy in order to select from amongst several possible models (called validation), then beware of reusing your training data for this step. Instead, partition your data into training data, and cross-validation data. The trouble is your models are already biased towards the data they were trained on; just because they do well on that doesn't mean they will generalize to what they might see in a real application. See the cross-validation wiki entry for more details.

Upvotes: 1

ElKamina

Reputation: 7817

One solution is, you can convert 4 non-mutually-exclusive binary classes into 16 mutually exclusive classes and construct the confusion matrix. If you have enough number of frames and small enough number of binary classes, this is the most appropriate solution.

Another simpler approach is to calculate recall and precision for each of the classes separately.

Upvotes: 1

How to compute the success ratio in a classification task with multi-label assignments

Answers (3)

Related Questions