user1067334
user1067334

Reputation: 243

Implementing Naïve Bayes algorithm in MATLAB - Need some guidance

I have a Binary classification problem that I need to do in MATLAB. There are two classes and the training data and testing data problems are from two classes and they are 2d coordinates drawn from Gaussian distributions.

The samples are 2D points and they are something like these (1000 samples for class A and 1000 samples for class B): I am just posting some of them here:

5.867766 3.843014 5.019520 2.874257 1.787476 4.483156 4.494783 3.551501 1.212243 5.949315 2.216728 4.126151 2.864502 3.139245 1.532942 6.669650 6.569531 5.032038 2.552391 5.753817 2.610070 4.251235 1.943493 4.326230 1.617939 4.948345

If a new test data comes in, how should I classify the test sample?

P(Class/TestPoint) is proportional to P(TestPoint/Class) * (ProbabilityOfClass).

I am not sure of how we compute the P(Sample/Class) variable for the 2D coordinates given. Right now, I am using the formula

P(Coordinates/Class) = (Coordinates- mean for that class) / standard deviation of points in that class).

However, I am not getting very good test results with this. Am I doing anything wrong?

Upvotes: 4

Views: 6660

Answers (2)

prusswan
prusswan

Reputation: 7091

Assuming your formula is correctly applied, another issue could be the derivation of features from your data points. Your problem might not be suited for a linear classifier.

Upvotes: 0

Oli
Oli

Reputation: 16035

That is the good method, however the formula is not correct, look at the multivariate gaussian distribution article on wikipedia:

P(TestPoint|Class)= enter image description here,

where enter image description here is the determinant of A.

 Sigma = classPoint*classPoint';
 mu = mean(classPoint,2);
 proba = 1/((2*pi)^(2/2)*det(Sigma)^(1/2))*...
         exp(-1/2*(testPoint-mu)*inv(Sigma)*(testPoint-mu)');

In your case, since they are as many points in both class, P(class)=1/2

Upvotes: 3

Related Questions