Eitan Vesely
Eitan Vesely

Reputation: 125

Best way to join different length column vectors into a matrix in MATLAB

Assuming i have a series of column-vectors with different length, what would be the best way, in terms of computation time, to join all of them into one matrix where the size of it is determined by the longest column and the elongated columns cells are all filled with NaN's.

Edit: Please note that I am trying to avoid cell arrays, since they are expensive in terms of memory and run time.

For example:

A = [1;2;3;4]; 
B = [5;6];

C = magicFunction(A,B);

Result:

C =
  1  5
  2  6
  3 NaN
  4 NaN

Upvotes: 1

Views: 173

Answers (2)

Divakar
Divakar

Reputation: 221704

The following code avoids use of cell arrays except for the estimation of number of elements in each vector and this keeps the code a bit cleaner. The price for using cell arrays for that tiny bit of work shouldn't be too expensive. Also, varargin gets you the inputs as a cell array anyway. Now, you can avoid cell arrays there too, but it would most probably involve use of for-loops and might have to use variable names for each of the inputs, which isn't too elegant when creating a function with unknown number of inputs. Otherwise, the code uses numeric arrays, logical indexing and my favourite bsxfun, which must be cheap in the market of runtimes.

Function Code

function out = magicFunction(varargin)

lens = cellfun(@(x) numel(x),varargin);
out = NaN(max(lens),numel(lens));
out(bsxfun(@le,[1:max(lens)]',lens)) = vertcat(varargin{:}); %//'

return;

Example

Script -

A1 = [9;2;7;8];
A2 = [1;5];
A3 = [2;6;3];
out = magicFunction(A1,A2,A3)

Output -

out =
     9     1     2
     2     5     6
     7   NaN     3
     8   NaN   NaN

Benchmarking

As part of the benchmarking, we are comparing our solution to @gnovice's solution that was mostly based on using cell arrays. Our intention here to see that after avoiding cell arrays, what speedups we are getting if there's any. Here's the benchmarking code with 20 vectors -

%// Let's create row vectors A1,A2,A3.. to be used with @gnovice's solution
num_vectors = 20;
max_vector_length = 1500000;
vector_lengths = randi(max_vector_length,num_vectors,1);
vs =arrayfun(@(x) randi(9,1,vector_lengths(x)),1:numel(vector_lengths),'uni',0);
[A1,A2,A3,A4,A5,A6,A7,A8,A9,A10,A11,A12,A13,A14,A15,A16,A17,A18,A19,A20] = vs{:};


%// Maximally cell-array based approach used in linked @gnovice's solution
disp('--------------------- With @gnovice''s approach')
tic
tcell = {A1,A2,A3,A4,A5,A6,A7,A8,A9,A10,A11,A12,A13,A14,A15,A16,A17,A18,A19,A20};
maxSize = max(cellfun(@numel,tcell));    %# Get the maximum vector size
fcn = @(x) [x nan(1,maxSize-numel(x))];  %# Create an anonymous function
rmat = cellfun(fcn,tcell,'UniformOutput',false);  %# Pad each cell with NaNs
rmat = vertcat(rmat{:});
toc, clear tcell maxSize fcn rmat

%// Transpose each of the input vectors to get column vectors as needed
%// for our problem
vs = cellfun(@(x) x',vs,'uni',0); %//'
[A1,A2,A3,A4,A5,A6,A7,A8,A9,A10,A11,A12,A13,A14,A15,A16,A17,A18,A19,A20] = vs{:};

%// Our solution
disp('--------------------- With our new approach')
tic
out = magicFunction(A1,A2,A3,A4,A5,A6,A7,A8,A9,A10,...
    A11,A12,A13,A14,A15,A16,A17,A18,A19,A20);
toc

Results -

--------------------- With @gnovice's approach
Elapsed time is 1.511669 seconds.
--------------------- With our new approach
Elapsed time is 0.671604 seconds.

Conclusions -

  1. With 20 vectors and with a maximum length of 1500000, the speedups are between 2-3x and it was seen that the speedups have increased as we have increased the number of vectors. The results to prove that are not shown here to save space, as we have already used quite a lot of it here.

Upvotes: 1

The Minion
The Minion

Reputation: 1164

If you use a cell matrix you won't need them to be filled with NaNs, just write each array into one column and the unused elements stay empty (that would be the space efficient way). You could either use:

 cell_result{1} = A;
 cell_result{2} = B; 

THis would result in a size 2 cell array which contains all elements of A,B in his elements. Or if you want them to be saved as columns:

 cell_result(1,1:numel(A)) = num2cell(A);
 cell_result(2,1:numel(B)) = num2cell(B); 

If you need them to be filled with NaN's for future coding, it would be the easiest to find the maximum length you got. Create yourself a matrix of (max_length X Number of arrays).

So lets say you have n=5 arrays:A,B,C,D and E.

h=zeros(1,n);
h(1)=numel(A);
h(2)=numel(B);
h(3)=numel(C);
h(4)=numel(D);
h(5)=numel(E);
max_No_Entries=max(h);
result= zeros(max_No_Entries,n);
result(:,:)=NaN;
result(1:numel(A),1)=A;
result(1:numel(B),2)=B;
result(1:numel(C),3)=C;
result(1:numel(D),4)=D;
result(1:numel(E),5)=E;

Upvotes: 0

Related Questions