Reputation: 1886
I am working on a simple AI program that classifies shapes using unsupervised learning method. Essentially I use the number of sides and angles between the sides and generate aggregates percentages to an ideal value of a shape. This helps me create some fuzzingness in the result.
The problem is how do I represent the degree of error or confidence in the classification? For example: a small rectangle that looks very much like a square would yield night membership values from the two categories but can I represent the degree of error?
Thanks
Upvotes: 2
Views: 756
Reputation: 1445
Building slightly on what lejlot has put forward; my preference would be to use the Mahalanobis distance with some squashing function. The Mahalanobis distance M(V, p)
allows you to measure the distance between a distribution V
and a point p
.
In your case, I would use "perfect" examples of each class to generate the distribution V
and p
is the classification you want the confidence of. You can then use something along the lines of the following to be your confidence interval.
1-tanh( M(V, p) )
Upvotes: 0
Reputation: 66805
Your confidence is based on used model. For example, if you are simply applying some rules based on the number of angles (or sides), you have some multi dimensional representation of objects:
feature 0, feature 1, ..., feature m
Nice, statistical approach
You can define some kind of confidence intervals, baesd on your empirical results, eg. you can fit multi-dimensional gaussian distribution to your empirical observations of "rectangle objects", and once you get a new object you simply check the probability of such value in your gaussian distribution, and have your confidence (which would be quite well justified with assumption, that your "observation" errors have normal distribution).
Distance based, simple approach
Less statistical approach would be to directly take your model's decision factor and compress it to the [0,1] interaval. For example, if you simply measure distance from some perfect shape to your new object in some metric (which yields results in [0,inf)) you could map it using some sigmoid-like function, eg.
conf( object, perfect_shape ) = 1 - tanh( distance( object, perfect_shape ) )
Hyperbolic tangent will "squash" values to the [0,1] interval, and the only remaining thing to do would be to select some scaling factor (as it grows quite quickly)
Such approach would be less valid in the mathematical terms, but would be similar to the approach taken in neural networks.
Relative approach
And more probabilistic approach could be also defined using your distance metric. If you have distances to each of your "perfect shapes" you can calculate the probability of an object being classified as some class with assumption, that classification is being performed at random, with probiability proportional to the inverse of the distance to the perfect shape.
dist(object, perfect_shape1) = d_1
dist(object, perfect_shape2) = d_2
dist(object, perfect_shape3) = d_3
...
inv( d_i )
conf(object, class_i) = -------------------
sum_j inv( d_j )
where
inv( d_i ) = max( d_j ) - d_i
Conclusions
First two ideas can be also incorporated into the third one to make use of knowledge of all the classes. In your particular example, the third approach should result in confidence of around 0.5
for both rectangle and circle, while in the first example it would be something closer to 0.01
(depending on how many so small objects would you have in the "training" set), which shows the difference - first two approaches show your confidence in classifing as a particular shape itself, while the third one shows relative confidence (so it can be low iff it is high for some other class, while the first two can simply answer "no classification is confident")
Upvotes: 2