Reputation: 2133
According to this question, I should try to use Preallocation is Matlab.
Now I have a situation that I cannot calculate the exact size of the matrix to preallocate. I can guess the size.
suppose the actual size of the matrix is 100, but I don't know it. Sh
Which scenario is more efficient:
Thanks.
Upvotes: 4
Views: 431
Reputation: 1409
In addition to the other educating answers, The short short version: There are three cases:
Upvotes: 2
Reputation: 26069
+1 for the interesting question.
EDITED Answer: From a little experimental study at first it seems better to add rows later, but it now seems more efficient overesrimated and preallocate again when you have the info of the correct size . I started with matrix size 3000 and guessed an error of 10% in the size estimation, see below:
clear all
clc
guess_size=3000;
m=zeros(guess_size);
%1. oops overesrimated, take out rows
tic
m(end-300:end,:)=[];
toc
%1b. oops overesrimated, preallocate again
tic
m=zeros(guess_size-300,guess_size);
toc
%2. oops overesrimated, take out cols
m=zeros(guess_size);
tic
m(:,end-300:end)=[];
toc
%2b. oops overesrimated, preallocate again
m=zeros(guess_size);
tic
m=zeros(guess_size,guess_size-300);
toc
%3. oops underesrimated, add rows
m=zeros(guess_size);
tic
m=zeros(guess_size+300,guess_size);
toc
%4. oops underesrimated, add cols
m=zeros(guess_size);
tic
m=zeros(guess_size,guess_size+300);
toc
Elapsed time is 0.041893 seconds.
Elapsed time is 0.026925 seconds.
Elapsed time is 0.041818 seconds.
Elapsed time is 0.023425 seconds.
Elapsed time is 0.027523 seconds.
Elapsed time is 0.029509 seconds.
Option 2b and 1b are slightly faster than underestimating, so if you can, better overestimate and then preallocate again. It is never efficient to delete rows from an array. Also adding columns seems slightly more efficient, but this is just a quick and dirty job. See @Shai detailed answer for the inner workings...
Upvotes: 2
Reputation: 114816
To my opinion, the answer is a bit more complex than portrayed by @natan. I think there are two factors his answer does not take into account:
Possible necessary copies of memory: when you under-estimate a matrix size and you re-allocate it, all its old values should be copied to the new allocated location.
Continuity of memory chunks: sometimes Matlab is able to allocate new memory continuously at the end of the old matrix. In principle, in such a scenario the old values need not be copied to the new location - since it is the same as the old one just bigger. However, if you add rows to a 2D matrix, the content needs to be copied even in this scenario, since Matlab stores matrices in a row-major fashion in memory.
So, my answer is this:
First of all, what exactly don't you know about the size of the matrix: if you know one dimension - make it the number of rows of your matrix, so you'll only need to change the number of columns. This way, if your already stored data needs to be copied, it would be copied at larger chunks.
Second, it depends on how much free RAM you have at your disposal. If you are not short at RAM, then there's nothing wrong with over estimating.
However, if you are short at RAM, consider under estimating. BUT when you re-allocate, increase the new block size at each iteration:
BASIC_SIZE = X; % first estimate
NEW_SIZE = Y; % if need more, add this amount
factor = 2;
arr = zeros( m, BASIC_SIZE ); % first allocation, assuming we know number of rows
while someCondition
% process arr ...
if needMoreCols
arr(:, size(arr,2) + (1:NEW_SIZE) ) = 0; % allocate another block
NEW_SIZE = round(NEW_SIZE * factor); % it seems like we are off in estimation, try larger chunk next time factor should be > 1
end
end
arr = arr(:, 1:actualNumOfCols ); % resize to actual size, discard unnecessary columns
Upvotes: 3