Andrew
Andrew

Reputation: 749

Joining data from different cell arrays in Matlab

I have data in Matlab that is in cell array format with columns representing different items. The cell arrays have different columns, as in the following example:

a = {'A', 'B', 'C' ; 1, 1, 1; 2, 2, 2 }

a =

'A'    'B'    'C'
[1]    [1]    [1]
[2]    [2]    [2]

b = {'C', 'D'; 3, 3; 4, 4}

b =

'C'    'D'
[3]    [3]
[4]    [4]

I would like to be able to join the different cell arrays in the following manner:

c =

'A'    'B'    'C'    'D'
[1]    [1]    [1]    [NaN]
[2]    [2]    [2]    [NaN]
[NaN]  [NaN]  [3]    [3]
[NaN]  [NaN]  [4]    [4]

In the real example I have hundreds of columns and several rows, so creating a new cell array manually is not an option for me.

Upvotes: 0

Views: 8031

Answers (3)

Amro
Amro

Reputation: 124573

Here is my solution adapted from an old another to a similar question (simply transpose rows/columns):

%# input cell arrays
a = {'A', 'B', 'C' ; 1, 1, 1; 2, 2, 2 };
b = {'C', 'D'; 3, 3; 4, 4};

%# transpose rows/columns
a = a'; b = b';

%# get all key values, and convert them to indices starting at 1
[allKeys,~,ind] = unique( [a(:,1);b(:,1)] );
indA = ind(1:size(a,1));
indB = ind(size(a,1)+1:end);

%# merge the two datasets (key,value1,value2)
c = cell(numel(allKeys), size(a,2)+size(b,2)-1);
c(:) = {NaN};                         %# fill with NaNs
c(:,1) = allKeys;                     %# available keys from both
c(indA,2:size(a,2)) = a(:,2:end);     %# insert 1st dataset values
c(indB,size(a,2)+1:end) = b(:,2:end); %# insert 2nd dataset values

Here is the result (transposed to match original orientation):

>> c'
ans = 
    'A'      'B'      'C'    'D'  
    [  1]    [  1]    [1]    [NaN]
    [  2]    [  2]    [2]    [NaN]
    [NaN]    [NaN]    [3]    [  3]
    [NaN]    [NaN]    [4]    [  4]

Also here is the solution using the DATASET class from the Statistics Toolbox:

aa = dataset([cell2mat(a(2:end,:)) a(1,:)])
bb = dataset([cell2mat(b(2:end,:)) b(1,:)])
cc = join(aa,bb, 'Keys',{'C'}, 'type','fullouter', 'MergeKeys',true)

with

cc = 
    A      B      C    D  
      1      1    1    NaN
      2      2    2    NaN
    NaN    NaN    3      3
    NaN    NaN    4      4

Upvotes: 1

Tom Lane
Tom Lane

Reputation:

If you were willing to store your data in dataset arrays (or convert them to dataset arrays for this purpose), you could do the following:

>> d1
d1 = 
    A    B    C
    1    1    1
    2    2    2
>> d2
d2 = 
    C    D
    3    3
    4    4
>> join(d1,d2,'Keys','C','type','outer','mergekeys',true)
ans = 
    A      B      C    D  
      1      1    1    NaN
      2      2    2    NaN
    NaN    NaN    3      3
    NaN    NaN    4      4

Upvotes: 3

emrea
emrea

Reputation: 1355

I'm assuming you want to join the two arrays based on their first row only.

% get the list of all keys
keys = unique([a(1,:) b(1,:)]);

lena = size(a,1)-1;  lenb = size(b,1)-1;

% allocate space for the joined array
joined = cell(lena+lenb+1, length(keys));

joined(1,:) = keys;

% add a
tf = ismember(keys, a(1,:));
joined(2:(2+lena-1),tf) = a(2:end,:);

% add b
tf = ismember(keys, b(1,:));
joined((lena+2):(lena+lenb+1),tf) = b(2:end,:);

This will give you the joined array except that it has empty cells instead NaNs. I hope this is OK.

Upvotes: 3

Related Questions