Reputation: 989
A
and B
are matrices consisting of binary elements. A
is denoted as the base data matrix and B
is the query matrix. A
consists of 75 data points each of length 10 and B
consists of 50 data points each of length 10. I want to calculate the distance between all data points in A
and every query data point in B
in order to apply nearest neighbor search. So instead of using the Euclidean or the hamming distance, I have used another metric :
N = 2
, k = length of data samples
, s = A(1,:)
and t = B(1,:)
.
The code works for one data sample in A
and another data sample in B
. How do I scale so that it works for all base data points and all query data points?
Let A(1,:) = [1,0,1,1,0,0,0,1,1,0]
is the first sample in A matrix. Let B(1,:) = [1,1,0,0,1,1,1,1,0,0]
is the first query point.
If the elements in samples taken from A
and B are same, 0 is recorded for each similar element, otherwise 1. The final distance is the sum of the 1's. So the program checks to see if two sequences are the same, setting b
to 1 if
so, or a zero otherwise. Can somebody please show how I can apply this to matrices?
l = length(A);
D=zeros(1,l);
for i=1:l,
if A(1,i)==B(1,i),
D(1,i)=0;
else
D(1,i)=1;
end
end
sum=0;
for j=1:l,
sum=sum+D(1,j);
end
if sum==0,
b = 1;
else
b = 0;
end
Upvotes: 1
Views: 1095
Reputation: 45752
Your distance metric is actually just the L1-norm i.e. sum(abs(x-y))
so in Octave you can use pdist2
like this:
pdist2(A,B,'L1')
In MATLAB you can use city block distance:
pdist2(A,B,'cityblock')
Note, to define your own distance metric (but 'cityblock'
is a better idea):
pdist2(A,B,@(x,y)sum(abs(bsxfun(@minus,x,y)),2))
Or
pdist2(A,B,@(x,y)sum(bsxfun(@xor,x,y),2))
Or
pdist2(A,B,@(x,y)sum(bsxfun(@ne,x,y),2))
The distance of one of your vectors with another can be found like this:
distance = @(x,y)sum(x~=y)
However you want to compare all the row of A
with all the rows of B
. bsxfun
is going to be useful here, we just need to use permute
to make one of the matrices go into the third dimension:
D = squeeze(sum(bsxfun(@ne, permute(B, [3,2,1]),A),2))'
For example if
A = [1,1,0;
0,0,1;
1,1,1];
B = [1,1,1;
0,0,0;
1,1,1;
0,1,0]
Then
> D = squeeze(sum(bsxfun(@ne, permute(B, [3,2,1]),A),2))'
D =
1 2 0
2 1 3
1 2 0
1 2 2
So the columns are now the rows of A
and he rows are the rows of B
so D(2,3)
means compare B(2,:)
(which is [0,0,0]
with A(3,:)
which is [1,1,1]
, and so since all the elements are different, the distance between them is 3
.
If you have the stats toolbox then you can use my distance
function above with pdist2
instead.
Upvotes: 2
Reputation: 1492
thats what i understood from your description(correct me if I am wrong) Let A be a matrix
A =
0 1 0 0 1 0 0 1 0 1
1 1 0 0 1 1 1 0 0 1
0 1 1 0 0 1 1 0 1 1
0 1 1 1 1 1 1 0 0 0
1 1 0 0 0 1 0 0 1 1
0 0 0 0 0 1 0 0 1 0
1 0 1 0 1 0 0 1 1 0
1 1 0 0 1 0 1 0 0 0
1 0 1 0 0 1 0 0 1 0
0 0 1 0 1 0 1 1 0 0
and B be
B =
0 1 1 1 0 1 0 0 1 0
0 1 0 1 0 0 1 1 1 1
1 0 0 1 1 0 0 0 1 1
0 1 0 1 1 0 0 0 1 0
1 1 0 0 0 0 0 0 0 1
1 1 0 1 1 1 1 1 1 0
1 1 0 0 1 0 0 1 0 1
0 0 1 1 0 1 1 1 0 1
0 0 0 1 1 0 0 0 1 0
0 1 1 0 0 0 1 1 0 1
>> C = A.*B
will give you common points between them if A has more number of row lets say then you can do is A(1:size(B,1),:).*B
instead
C =
0 1 0 0 0 0 0 0 0 0
0 1 0 0 0 0 1 0 0 1
0 0 0 0 0 0 0 0 1 1
0 1 0 1 1 0 0 0 0 0
1 1 0 0 0 0 0 0 0 1
0 0 0 0 0 1 0 0 1 0
1 0 0 0 1 0 0 1 0 0
0 0 0 0 0 0 1 0 0 0
0 0 0 0 0 0 0 0 1 0
0 0 1 0 0 0 1 1 0 0
In the find if there are no matching points b = 1 otherwise 0
b = ~find(sum(sum(C)))
Update : if D should be 75x50 like you say so then C should be
C = A*(B.')
instead of
C = A.*B
because first i thought its a dot comparison from your code
Upvotes: 0
Reputation: 104555
If you want to get your code working as is with loops, simply allocate space that is size(A,1) x size(B,1)
large so that each spatial location (i,j)
gives you "distance" between rows i
and j
.
Therefore, do something like this. This is assuming that A
is a M x d
matrix and B
is a N x d
matrix where d
are the total number of feature points and M
and N
are arbitrary positive numbers that denote how many rows of elements there are in each.
b = zeros(size(A,1), size(B,1)); % Change
l = size(A,2); % Change - Total number of feature points
for ii = 1 : size(A,1) % Change
for jj = 1 : size(B,1)
D=zeros(1,l);
for i=1:l,
if A(ii,i)==B(jj,i) % Change
D(i)=0;
else
D(i)=1;
end
end
sum=0;
for j=1:l,
sum=sum+D(j);
end
if sum==0,
b(ii,jj) = 1; % Change
end
end
end
This will iterate over all combinations of rows. However, use any of the previous answers here to get it vectorized. I just wanted to show you how you would modify your current code if this is what you're most comfortable with.
Upvotes: 2
Reputation: 5822
One line solution
This calculation can be done in a single line of code:
D = A*B'+(1-A)*(1-B)' < size(A,2)
Explanation
Do to the fact that A and B are binary, the distance function between each sample at A and each sample at B basically checks if the amount of per-coordinates matches is equal to a sample's length. You can use matrix multiplication to achieve this.
More descriptive code example
Define A and B as two binary matrices as you mentioned in your answer:
%initializes A and B randomly
A = double(rand(75,10) > 0.5);
B = double(rand(50,10) > 0.5);
[m,n] = size(A);
The distance between each sample in A and each sample in B can be calculated as follows:
First, define a matrix D of size 75x50, s.t D(i,j) is contains the number of matches between the sample i in A and the sample j in B.
It can be calculated as follows:
D = A*B' + (1-A)*(1-B)';
The final distance measure can be done by testing for each pair (i,j) if their amount of matches is smaller than n (n is the length of each sample). If it is smaller the samples are different and the result should be 1. Otherwise it should be zero. this can be done as follows:
finalDist = D < n ;
Upvotes: 5