Mushi
Mushi

Reputation: 367

Finding closest (not equal) rows in two matrices matlab

I have two matrices A(m X 3) and B(n X 3); where m >> n.

Numbers in B have close or equal values to numbers in A.

I want to search closest possible values from A to the values present in B in a way that at the end of search, A will reduced to (n X 3).

There are two main issues:

  1. Only a complete row from A can be compared to a complete row in B, where numbers in each column of A and B are varying independently.
  2. Numbers in A and B may be as close as third place of decimal (e.g. 20.101 and 20.103)

I hope I am clear in asking my question. Does anybody know about any function already present in matlab for this thing?

Upvotes: 1

Views: 1339

Answers (3)

chappjc
chappjc

Reputation: 30579

Depending on how you look at the task, here are two different approaches

Minimum Distance to Each Row in Second Matrix

Two ways to look at this: (1) closest point in A for each point in B, or (2) closest point in B for each point in A.

Closest point in A

For each point in B you can find the closest point in A (e.g. Euclidean distance), as requested in comments:

% Calculate all MxN high-dimensional (3D space) distances at once
distances = squeeze(sum(bsxfun(@minus,B,permute(A,[3 2 1])).^2,2));

% Find closest row in A for each point in B
[~,ik] = min(distances,[],2)

Make an array the size of B containing these closest points in A:

Anew = A(ik,:)

This will implicitly throw out any points in A that are too far from points in B, as long as each point in B does have a match in A. If each point in B does not necessarily have a "match" (point at an acceptable distance) in A, then it is necessary to actively reject points based on distances, resulting in an output that would be shorter than B. This solution seems out of scope.

Closest point in B

Compute the Euclidenan distance from each point (row) in A to each point in B and identify the closest point in B:

distances = squeeze(sum(bsxfun(@minus,A,permute(B,[3 2 1])).^2,2));
[~,ik] = min(distances,[],2)

Make an array the size of A containing these closest points in B:

Anew = B(ik,:)

The size of Anew in this approach is the same as A.

Merging Similar Points in First Matrix

Another approach is to use the undocumented _mergesimpts function.

Consider this test data:

>> B = randi(5,4,3)
B =
     1     4     4
     2     3     4
     1     3     4
     3     4     5
>> tol = 0.001;
>> A = repmat(B,3,1) + tol * rand(size(B,1)*3,3)
A =
    1.0004    4.0005    4.0000
    2.0004    3.0005    4.0008
    1.0004    3.0009    4.0002
    3.0008    4.0005    5.0004
    1.0006    4.0004    4.0007
    2.0008    3.0007    4.0004
    1.0009    3.0007    4.0007
    3.0010    4.0005    5.0004
    1.0002    4.0003    4.0007
    2.0001    3.0001    4.0007
    1.0007    3.0006    4.0004
    3.0001    4.0003    5.0000

Merge similar rows in A according to a specified tolerance, tol:

>> builtin('_mergesimpts',A,tol,'average')
ans =
    1.0004    4.0004    4.0005
    1.0007    3.0007    4.0005
    2.0005    3.0005    4.0006
    3.0006    4.0004    5.0003

Merge similar rows, using B to get expected numbers

>> builtin('_mergesimpts',[A; B],tol,'first')
ans =
     1     3     4
     1     4     4
     2     3     4
     3     4     5

Upvotes: 4

Luis Mendo
Luis Mendo

Reputation: 112679

To replace each row of A by the closest row of B

You can use pdist2 to compute distance between rows, and then the second output of min to find the index of the minimum-distance row:

[~, ind] = min(pdist2(B,A,'euclidean')); %// or specify some other distance
result = B(ind,:);

The advantage of this approach is that pdist2 lets you specify other distance functions, or even define your own. For example, to use L1 distance change first line to

[~, ind] = min(pdist2(B,A,'cityblock'));

To retain rows of A which are closest to rows of B

Use pdist2 as above. For each row of A compute the minimum distance to rows of B. Retain the n rows of A with lowest value of that minimum distance:

[~, ii] = sort(min(pdist2(B,A,'euclidean'))); %// or use some other distance
result = A(ii(1:n),:);

Upvotes: 1

Divakar
Divakar

Reputation: 221564

Try this code -

%% Create data
m=7;
n=4;
TOL = 0.0005;

A = rand(m,3)/100;
B = rand(n,3)/100;
B(2,:) = A(5,:); % For testing that the matching part of the second row from B must be the fifth row from A

%% Interesting part
B2 = repmat(reshape(B',1,3,n),[m 1]);
closeness_matrix = abs(bsxfun(@minus, A, B2));
closeness_matrix(closeness_matrix<TOL)=0;
closeness_matrix_mean = mean(closeness_matrix,2);  % Assuming that the "closeness" for the triplets on each row can be measured by the mean value of them
X1 = squeeze(closeness_matrix_mean);

[minval,minind] = min(X1,[],1);
close_indices = minind';

A_final = A(close_indices,:)

Upvotes: 0

Related Questions