Reputation: 47
I am trying to merge two matrices that share same values in multiple columns.
The following matrices should exemplify my problem and provide a MWE. However, my data is very long size(500000, 4)
, therefore I am searching an efficient way to merge them. The data consists of option data where c
are call and p
put data, with columns 1:4
: date, strike, expiry, bid-price. In the end I would like to have matrix with columns 1:5
: date, strike, expiry, call-bid-price, put-bid-price. As shown in the MWE, the data is not of same length, but each combination of the columns 1:3
(date, strike, expiry) only exists once.
c = [7356011 300 7356081 1.15;
7356011 400 7356081 1.56;
7356011 500 7356081 1.79;
7356011 300 7356088 1.25;
7356011 400 7356088 1.67;
7356011 500 7356088 1.89;
7356011 600 7356088 1.92;
7356012 300 7356081 0.79;
7356012 400 7356081 0.99;
7356012 500 7356081 1.08;
7356012 300 7356088 0.81;
7356012 400 7356088 0.90;
7356012 500 7356088 1.07]
p = [7356011 300 7356081 1.35;
7356011 400 7356081 1.15;
7356011 500 7356081 1.03;
7356011 300 7356088 1.56;
7356011 400 7356088 1.15;
7356011 500 7356088 1.03;
7356012 300 7356081 1.25;
7356012 400 7356081 1.19;
7356012 500 7356081 1.02;
7356012 300 7356088 1.14;
7356012 400 7356088 0.98;
7356012 500 7356088 0.76;
7356012 600 7356088 0.56;
7356012 700 7356088 0.44]
I tried to build an ID for each column, by using strcat
and num2str
, and getting 'ID(1) = 73560113007356081' however this takes very long for the large amount of data. I have also tried to find a solution using, unique
and ismember
, but had trouble with the multiple columns.
Wished output would be:
7356011 300 7356081 1.15 1.35
7356011 400 7356081 1.56 1.15
7356011 500 7356081 1.79 1.03
7356011 300 7356088 1.25 1.56
7356011 400 7356088 1.67 1.15
7356011 500 7356088 1.89 1.03
7356011 600 7356088 1.92 NaN
7356012 300 7356081 0.79 1.25
7356012 400 7356081 0.99 1.19
7356012 500 7356081 1.08 1.02
7356012 300 7356088 0.81 1.14
7356012 400 7356088 0.90 0.98
7356012 500 7356088 1.07 0.76
7356012 600 7356088 NaN 0.56
7356012 700 7356088 NaN 0.44
Thanks for any help
Upvotes: 0
Views: 136
Reputation: 3440
You don't need to use a loop, use intersect
instead.
[~,ic,ip] = intersect(c(:, 1:3),p(:, 1:3),'rows');
m = [c(ic, :), p(ip,end)];
Edit:
If you want to include NaN
s where they don't intersect like the above poster.
function m = merge(c, p, nc, np)
%check for input arg errors
if nargin == 3
np = nc;
elseif nargin ~= 4
disp('Please enter either 3 or 4 arguments')
m = {};
return
end
%make sure they are shaped the same
nc = reshape(nc, 1, []);
np = reshape(np, 1, []);
%And have the same number of elements
if numel(nc) ~= numel(np)
disp('Please ensure arguments 3 and 4 have the same number of elements')
m = {};
return
end
%The columns that aren't being compared
NotNC = find(~ismember(1:size(c,2), nc));
NotNP = find(~ismember(1:size(p,2), np));
%Find the matching rows
[matches,ic,ip] = intersect(c(:, nc),p(:, np),'rows');
%Put together matching rows with the other data not included in the match
m1 = [matches, c(ic, NotNC), p(ip, NotNP)];
%Find rows that did not matched
NotIC = find(~ismember(1:size(c,1), ic));
NotIP = find(~ismember(1:size(p,1), ip));
%Put together data not in the matched set
m2 = [c(NotIC, nc), c(NotIC, NotNC), nan(length(NotIC), size(NotNP,2))];
m3 = [p(NotIP, np), nan(length(NotIP), size(NotNC,2)), p(NotIP, NotNP)];
%merge all three lists
m = [m1; m2; m3];
end
Upvotes: 1
Reputation: 4136
Ok, I did not understand if p is always bigger so I'll write the two solutions with an if
.
if length(c) > length(p)
xx = length(c);
newm = [c NaN(xx, 1)];
row = ismember(c, p, 'rows');
newm(row, end) = p(row, end);
else
xx = length(p);
newm = [p(:,1:3) NaN(xx, 1) p(:, end)];
row = ismember(p(:,1:3), c(:,1:3), 'rows');
newm(row, 4) = c(row, end);
end
UPDATE:
this code works for your present example.
[row_p, row_c] = ismember(p(:,1:3), c(:,1:3), 'rows');
newm = [];
for ii = 1:length(row_p)
if row_p(ii) == 1
newm = [newm; p(ii, 1:3) c(row_c(ii), end) p(ii, end)];
else
newm = [newm; p(ii, 1:3) NaN p(ii, end)];
end
end
[row_c, row_p] = ismember(c(:,1:3), p(:,1:3), 'rows');
for ii = 1:length(row_c)
if row_c(ii) == 1
newm = [newm; c(ii, 1:3) c(ii, end) p(row_p(ii), end)];
else
newm = [newm; c(ii, 1:3) c(ii, end) NaN];
end
end
newm = unique(newm, 'rows');
Upvotes: 0