zegkljan
zegkljan

Reputation: 8419

Merging sorted pairs

I have two (or more but if solved for two, it's solved for any number) 2-by-N matrices which represent points with an x (the first row) and y (the second row) coordinates. The points are always sorted in the increasing x coordinate. What I want to do is I want to merge these two matrices into one 3-by-N matrix so that if two points (one from each matrix) have the same x coordinate, they would form one column in the new matrix, the first row being the x coordinate and the second and third row being the two y coordinates. However, if there is a point in one matrix that has x coordinate different than all other points in the second matrix, I still want to have full 3-element column that is placed such that the x coordinates are still sorted and the missing value from the other matrix is replaced by the nearest value with lower x coordinate (or NaN if there is none).

Better to explain by example.

First matrix:

1  3  5  7  % x coordinate
1  2  3  4  % y coordinate

Second matrix:

2  3  4  7  8  % x coordinate
5  6  7  8  9  % y coordinate

Desired result:

1    2  3  4  5  7  8  % x coordinate
1    1  2  2  3  4  4  % y coordinate from first matrix
NaN  5  6  7  7  8  9  % y coordinate from second matrix

My question is, how can I do it effectively in matlab/octave and numpy? (Effectively because I can always do it "manually" with loops but that doesn't seem right.)

Upvotes: 1

Views: 91

Answers (3)

Wolfie
Wolfie

Reputation: 30175

Your example:

a = [1 3 5 7; 1 2 3 4];
b = [2 3 4 7 8; 5 6 7 8 9];
% Get the combined (unique, sorted) `x` coordinates 
output(1,:) = unique([a(1,:), b(1,:)]);
% Initialise y values to NaN
output(2:3, :) = NaN;   
% Add x coords from `a` and `b`
output(2, ismember(output(1,:),a(1,:))) = a(2,:);
output(3, ismember(output(1,:),b(1,:))) = b(2,:);
% Replace NaNs in columns `2:end` with the previous value. 
% A simple loop has the advantage of capturing multiple consecutive NaNs.
for ii = 2:size(output,2)
    colNaN = isnan(output(:, ii));
    output(colNaN, ii) = output(colNaN, ii-1);
end

If you have more than 2 matrices (as suggested in your question) then I'd advise

  • Store them in a cell array, and loop over them to do the calls to ismember, instead of having one code line per matrix hardcoded.
  • The NaN replacement loop is already vectorised for any number of rows.

This is the generic solution for any number of matrices, demonstrated with a and b:

mats = {a, b};
cmats = horzcat(mats);
output(1, :) = unique(cmats(1,:));
output(2:numel(mats)+1, :) = NaN;
for ii = 1:size(mats)
    output(ii+1, ismember(output(1,:), mats{ii}(1,:))) = mats{ii}(2,:);
end
for ii = 2:size(output,2)
    colNaN = isnan(output(:,ii));
    output(colNaN, ii) = output(colNaN, ii-1);
end

Upvotes: 1

Nicky Mattsson
Nicky Mattsson

Reputation: 3052

You can do it with interp1 and the keyword 'previous' for strategy (you can also choose 'nearest' if you do not care if it is larger or smaller) and 'extrap' for allowing extrapolation.

Define the matrices

a=[...
1  3  5  7;... 
1  2  3  4];

b=[...
2  3  4  7  8;...
5  6  7  8  9];

Then find the interpolation points

x = unique([a(1,:),b(1,:)]);

And interpolate

[x ; interp1(a(1,:),a(2,:),x,'previous','extrap') ; interp1(b(1,:),b(2,:),x,'previous','extrap') ]

Timeit results:

I tested the algorithms on

n = 1e6;
a = cumsum(randi(3,2,n),2);
b = cumsum(randi(2,2,n),2);

and got:

  • Wolfie: 1.7473 s
  • Flawr: 0.4927 s
  • Mine: 0.2757 s

Upvotes: 5

flawr
flawr

Reputation: 11638

This verions uses set operations:

a=[...
1  3  5  7;... 
1  2  3  4];

b=[...
2  3  4  7  8;...
5  6  7  8  9];

% compute union of x coordinates
c = union(a(1,:),b(1,:));

% find indices of x of a and b coordinates in c
[~,~,ia] = intersect(a(1,:),c); 
[~,~,ib] = intersect(b(1,:),c);

% create output matrix
d = NaN(3,numel(c));
d(1,:) = c;
d(2,ia) = a(2,:);
d(3,ib) = b(2,:);

% fill NaNs
m = isnan(d);
m(:,1) = false;
i = find(m(:,[2:end,1])); %if you have multiple consecutive nans you have to repeat these two steps
d(m) = d(i);

disp(d);

Try it online!

Upvotes: 2

Related Questions