Reputation: 2901
I am a total beginner in Matlab and trying to write some Machine Learning Algorithms in Matlab. I would really appreciate it if someone can help me in debugging this code.
function y = KNNpredict(trX,trY,K,X)
% trX is NxD, trY is Nx1, K is 1x1 and X is 1xD
% we return a single value 'y' which is the predicted class
% TODO: write this function
% int[] distance = new int[N];
distances = zeroes(N, 1);
examples = zeroes(K, D+2);
i = 0;
% for(every row in trX) { // taking ONE example
for row=1:N,
examples(row,:) = trX(row,:);
%sum = 0.0;
%for(every col in this example) { // taking every feature of this example
for col=1:D,
% diff = compute squared difference between these points - (trX[row][col]-X[col])^2
diff =(trX(row,col)-X(col))^2;
sum += diff;
end % for
distances(row) = sqrt(sum);
examples(i:D+1) = distances(row);
examples(i:D+2) = trY(row:1);
end % for
% sort the examples based on their distances thus calculated
sortrows(examples, D+1);
% for(int i = 0; i < K; K++) {
% These are the nearest neighbors
pos = 0;
neg = 0;
res = 0;
for row=1:K,
if(examples(row,D+2 == -1))
neg = neg + 1;
else
pos = pos + 1;
%disp(distances(row));
end
end % for
if(pos > neg)
y = 1;
return;
else
y = -1;
return;
end
end
end
Thanks so much
Upvotes: 0
Views: 340
Reputation: 124543
When working with matrices in MATLAB, it is usually better to avoid excessive loops and instead use vectorized operations whenever possible. This will usually produce faster and shorter code.
In your case, the k-nearest neighbors algorithm is simple enough and can be well vectorized. Consider the following implementation:
function y = KNNpredict(trX, trY, K, x)
%# euclidean distance between instance x and every training instance
dist = sqrt( sum( bsxfun(@minus, trX, x).^2 , 2) );
%# sorting indices from smaller to larger distances
[~,ord] = sort(dist, 'ascend');
%# get the labels of the K nearest neighbors
kTrY = trY( ord(1:min(K,end)) );
%# majority class vote
y = mode(kTrY);
end
Here is an example to test it using the Fisher-Iris dataset:
%# load dataset (data + labels)
load fisheriris
X = meas;
Y = grp2idx(species);
%# partition the data into training/testing
c = cvpartition(Y, 'holdout',1/3);
trX = X(c.training,:);
trY = Y(c.training);
tsX = X(c.test,:);
tsY = Y(c.test);
%# prediction
K = 10;
pred = zeros(c.TestSize,1);
for i=1:c.TestSize
pred(i) = KNNpredict(trX, trY, K, tsX(i,:));
end
%# validation
C = confusionmat(tsY, pred)
The confusion matrix of the kNN prediction with K=10:
C =
17 0 0
0 16 0
0 1 16
Upvotes: 2