Reputation: 243
I have a Binary classification problem that I need to do in MATLAB. There are two classes and the training data and testing data problems are from two classes and they are 2d coordinates drawn from Gaussian distributions.
The samples are 2D points and they are something like these (1000 samples for class A and 1000 samples for class B): I am just posting some of them here:
5.867766 3.843014 5.019520 2.874257 1.787476 4.483156 4.494783 3.551501 1.212243 5.949315 2.216728 4.126151 2.864502 3.139245 1.532942 6.669650 6.569531 5.032038 2.552391 5.753817 2.610070 4.251235 1.943493 4.326230 1.617939 4.948345
If a new test data comes in, how should I classify the test sample?
P(Class/TestPoint) is proportional to P(TestPoint/Class) * (ProbabilityOfClass).
I am not sure of how we compute the P(Sample/Class) variable for the 2D coordinates given. Right now, I am using the formula
P(Coordinates/Class) = (Coordinates- mean for that class) / standard deviation of points in that class).
However, I am not getting very good test results with this. Am I doing anything wrong?
Upvotes: 4
Views: 6660
Reputation: 7091
Assuming your formula is correctly applied, another issue could be the derivation of features from your data points. Your problem might not be suited for a linear classifier.
Upvotes: 0
Reputation: 16035
That is the good method, however the formula is not correct, look at the multivariate gaussian distribution article on wikipedia:
P(TestPoint|Class)= ,
where is the determinant of A.
Sigma = classPoint*classPoint';
mu = mean(classPoint,2);
proba = 1/((2*pi)^(2/2)*det(Sigma)^(1/2))*...
exp(-1/2*(testPoint-mu)*inv(Sigma)*(testPoint-mu)');
In your case, since they are as many points in both class, P(class)=1/2
Upvotes: 3