Reputation: 219
RECCELL
is a cell array with 8 columns and 30000 rows:
C1 C2 C3 C4 C5 C6 C7 C8
'AA' 1997 19970102 1 'BACHE' 'MORI' 148 127
'AA' 1997 19970108 2 'MORGAN' [] 1595 0
'AA' 1997 19970224 3 'KEMSEC' 'FATHI' 1315 297
CONCELL
is a cell array with 4 columns and 70000 rows:
C1 C2 D3 D4
'AA' 1997 19970116 2,75
'AA' 1997 19970220 2,71
'AA' 1997 19970320 2,61
I would like to add to RECCELL
the 4 columns of CONCELL
only in case the C1s
match and C3
and D3
(both dates) are the closest possible. For instance I would get in this example:
C1 C2 C3 C4 C5 C6 C7 C8 C1 C2 D3 D4
'AA' 1997 19970102 1 'BACHE' 'MORI' 148 127 'AA' 1997 19970116 2,75
'AA' 1997 19970108 2 'MORGAN' [] 1595 0 'AA' 1997 19970116 2,75
'AA' 1997 19970113 3 'KEMSEC' 'FATHI' 1315 297 'AA' 1997 19970220 2,71
The code I have so far is:
[~, indCon, indREC] = intersect(CONCELL(:,1), RECCELL(:,1));
REC_CON=[RECCELL(indREC,:),CONCELL(indCon,:)];
NO_REC_CON= RECCELL(setdiff(1:size(RECCELL,1), indREC),:);
It's wrong because I cannot use intersect
for a string element and because I am not considering the second condition, which is to choose the closest dates.
Can someone help me? Thank you
Upvotes: 1
Views: 183
Reputation: 71
I would suggest to do this inside a for loop as the cells are very tall. (Note: it seems like the date format (C3/D3) in the cell is a double opposed to a string, thus needs to be converted first for using datenum)
n=size(RECCELL,1);
ind=zeros(n,1);
rd=datenum(num2str(cell2mat(CONCELL(:,3))),'yyyymmdd'); % convert double to string
for k=1:n
a=find(ismember(CONCELL(:,1),RECCELL(k,1))==1); % find indices of matching C1s
if ~isempty(a) % do only if there is a match for the C1s
dnk=datenum(num2str(RECCELL{k,3}),'yyyymmdd'); % convert double to string
[~,f]=min((rd(a)-dnk).^2); % find closest date of the subset a
ind(k,1)=a(f); % assign index of closest match to ind
RECCELL(k,(end+1):(end+4))=CONCELL(ind(k,1),:); % add CONCELL to RECCELL, be aware that other rows will now display empty cells, and a row of RECCELL can keep 'growing'
end
end
The vector ind
contains the indices of the closest match in CONCELL
for each entry in RECCELL
. When it contains a 0, no match was found between the C1
s.
Edit: One possible solution to avoid increasing the number of columns of RECCELL
if multiple CONCELL
entries are added to the same RECCELL
entry is the following which results in a adding a single column to the RECCELL matrix:
n=size(RECCELL,1);
RECCELL{1,end+1}=[]; % to add a single empty column to RECCELL
ind=zeros(n,1);
rd=datenum(num2str(cell2mat(CONCELL(:,3))),'yyyymmdd'); % convert double to string
for k=1:n
a=find(ismember(CONCELL(:,1),RECCELL(k,1))==1); % find indices of matching C1s
if ~isempty(a) % do only if there is a match for the C1s
dnk=datenum(num2str(RECCELL{k,3}),'yyyymmdd'); % convert double to string
[~,f]=min((rd(a)-dnk).^2); % find closest date of the subset a
ind(k,1)=a(f); % assign index of closest match to ind
if isempty(RECCELL{k,end}) % if nothing is in this cell, add the CONCELL entry to it
RECCELL{k,end}=CONCELL(ind(k,1),:);
else % if something is already in, add the new CONCELL entry to the cell
RECCELL{k,end}(end+1,1:4)=CONCELL(ind(k,1),:);
end
end
end
Upvotes: 1