philippe
philippe

Reputation: 47

Join matrices with same values in different vectors in MATLAB

I am trying to merge two matrices that share same values in multiple columns.

The following matrices should exemplify my problem and provide a MWE. However, my data is very long size(500000, 4), therefore I am searching an efficient way to merge them. The data consists of option data where c are call and p put data, with columns 1:4: date, strike, expiry, bid-price. In the end I would like to have matrix with columns 1:5: date, strike, expiry, call-bid-price, put-bid-price. As shown in the MWE, the data is not of same length, but each combination of the columns 1:3 (date, strike, expiry) only exists once.

c = [7356011 300 7356081 1.15; 7356011 400 7356081 1.56; 7356011 500 7356081 1.79; 7356011 300 7356088 1.25; 7356011 400 7356088 1.67; 7356011 500 7356088 1.89; 7356011 600 7356088 1.92; 7356012 300 7356081 0.79; 7356012 400 7356081 0.99; 7356012 500 7356081 1.08; 7356012 300 7356088 0.81; 7356012 400 7356088 0.90; 7356012 500 7356088 1.07]

p = [7356011 300 7356081 1.35; 7356011 400 7356081 1.15; 7356011 500 7356081 1.03; 7356011 300 7356088 1.56; 7356011 400 7356088 1.15; 7356011 500 7356088 1.03; 7356012 300 7356081 1.25; 7356012 400 7356081 1.19; 7356012 500 7356081 1.02; 7356012 300 7356088 1.14; 7356012 400 7356088 0.98; 7356012 500 7356088 0.76; 7356012 600 7356088 0.56; 7356012 700 7356088 0.44]

I tried to build an ID for each column, by using strcat and num2str, and getting 'ID(1) = 73560113007356081' however this takes very long for the large amount of data. I have also tried to find a solution using, unique and ismember, but had trouble with the multiple columns.

Wished output would be:

7356011 300 7356081 1.15 1.35 7356011 400 7356081 1.56 1.15 7356011 500 7356081 1.79 1.03 7356011 300 7356088 1.25 1.56 7356011 400 7356088 1.67 1.15 7356011 500 7356088 1.89 1.03 7356011 600 7356088 1.92 NaN 7356012 300 7356081 0.79 1.25 7356012 400 7356081 0.99 1.19 7356012 500 7356081 1.08 1.02 7356012 300 7356088 0.81 1.14 7356012 400 7356088 0.90 0.98 7356012 500 7356088 1.07 0.76 7356012 600 7356088 NaN 0.56 7356012 700 7356088 NaN 0.44

Thanks for any help

Upvotes: 0

Views: 136

Answers (2)

user1543042
user1543042

Reputation: 3440

You don't need to use a loop, use intersect instead.

[~,ic,ip] = intersect(c(:, 1:3),p(:, 1:3),'rows');
m = [c(ic, :), p(ip,end)];

Edit: If you want to include NaNs where they don't intersect like the above poster.

function m = merge(c, p, nc, np)
    %check for input arg errors
    if nargin == 3
        np = nc;
    elseif nargin ~= 4
        disp('Please enter either 3 or 4 arguments')
        m = {};
        return
    end

    %make sure they are shaped the same
    nc = reshape(nc, 1, []);
    np = reshape(np, 1, []);

    %And have the same number of elements
    if numel(nc) ~= numel(np)
        disp('Please ensure arguments 3 and 4 have the same number of elements')
        m = {};
        return
    end

    %The columns that aren't being compared
    NotNC = find(~ismember(1:size(c,2), nc));
    NotNP = find(~ismember(1:size(p,2), np));

    %Find the matching rows
    [matches,ic,ip] = intersect(c(:, nc),p(:, np),'rows');

    %Put together matching rows with the other data not included in the match
    m1 = [matches, c(ic, NotNC), p(ip, NotNP)];

    %Find rows that did not matched
    NotIC = find(~ismember(1:size(c,1), ic));
    NotIP = find(~ismember(1:size(p,1), ip));

    %Put together data not in the matched set
    m2 = [c(NotIC, nc), c(NotIC, NotNC), nan(length(NotIC), size(NotNP,2))];
    m3 = [p(NotIP, np), nan(length(NotIP), size(NotNC,2)), p(NotIP, NotNP)];

    %merge all three lists
    m = [m1; m2; m3];

end

Upvotes: 1

SamuelNLP
SamuelNLP

Reputation: 4136

Ok, I did not understand if p is always bigger so I'll write the two solutions with an if.

if length(c) > length(p)
    xx = length(c);
    newm = [c NaN(xx, 1)];
    row = ismember(c, p, 'rows');

    newm(row, end) = p(row, end);
else
    xx = length(p);
    newm = [p(:,1:3) NaN(xx, 1) p(:, end)];

    row = ismember(p(:,1:3), c(:,1:3), 'rows');

    newm(row, 4) = c(row, end);
end

UPDATE:

this code works for your present example.

[row_p, row_c] = ismember(p(:,1:3), c(:,1:3), 'rows');

newm = [];

for ii = 1:length(row_p)
    if row_p(ii) == 1
        newm = [newm; p(ii, 1:3) c(row_c(ii), end) p(ii, end)];
    else
        newm = [newm; p(ii, 1:3) NaN p(ii, end)];
    end
end

[row_c, row_p] = ismember(c(:,1:3), p(:,1:3), 'rows');

for ii = 1:length(row_c)
    if row_c(ii) == 1
        newm = [newm; c(ii, 1:3) c(ii, end) p(row_p(ii), end)];
    else
        newm = [newm; c(ii, 1:3) c(ii, end) NaN];
    end
end

newm = unique(newm, 'rows');

Upvotes: 0

Related Questions