Reputation: 2332
I'm not well versed in MATLAB and was curious about how it handles dynamic memory allocation under the hood?
One primary method is to allocate large chunks and more than is necessary so that you don't have to allocate for each new element added. In doing a bit of research, I saw a lot of people who personally managed their own large-chunk allocation (assuming they didn't know their final size) or did things like creating a maximum size, then trimming. An example is Undocumented MATLAB which advises you to perform block memory allocation yourself. I would have thought that a language like MATLAB would know to do it itself and I wouldn't be required to concern myself with issues like that. This implies to me if you try to append a single new element on to an array, MATLAB allocates new memory for only that single element, which is horribly inefficient.
My question is two-fold
Upvotes: 7
Views: 4548
Reputation: 8401
I threw together a test function to infer some more concrete details about dynamic allocation. The function used is at the bottom of the answer.
The function calls several location functions that calculate a value y
from an input x
of varying, logarithmically-spaced length. The functions differ but loop type (for
and while
) and allocation scheme (dynamic and pre-allocate). The results are from R2014b.
The four functions elapsed time results are shown below.
For low element count, the run time is constant; for high element count, all four functions enter a growth regime with similar rates of increase.
A power fit of the data (i.e., c = ArgMin[c(1)*n^c(2) - time]
) returns an exponent ranging from 0.93 - 0.96 (linear growth) across the four data sets.
I'm really surprised by these results as either I'm missing something in the test or Matlab's JIT compiler is really good at allocating arrays (possibly linked-lists under the hood).
The pre-allocated for
-loop did run the fastest but only 12% faster than the dynamic version.
Turning to memory consumption, I ran the dynamic and pre-allocated versions of the for
-loop variant for several linearly increasing elements counts in the ten million element range.
The total memory used by Matlab was sampled at the start and end of the loops.
The results are shown below.
As shown, the starting memory for the function calls is identical (for the most part) between the two functions. The memory usage for the pre-allocated version increases linearly with element count, which is expected. But the memory used for the dynamic version, while still piece-wise and trend-wise linear, has jumps in the slope of memory at several points. To me, this implies Matlab is doing some form of table growth (not necessarily table-doubling like I think Python does) to not re-allocate the array every iteration (which does make me question my thought about linked-lists from the above elapsed time discussion).
I found these results interesting but cannot really conclude much more beyond the thoughts above. However, regardless of anything, explicit pre-allocation will always better for both run time and code maintenance in any numeric heavy application.
Any comments about the results and test function (possibly discussing how this test is just plain wrong) are welcome, of course. That's how I (we) learn.
function [] = test()
% Constants and setup
funs = {@forDynamic,@forAllocate,@whileDynamic,@whileAllocate};
Nfun = numel(funs) ;
%
Nalloc = 2E7 ;
Nsamp = 50 ;
%
x = linspace(0,1,Nalloc) ;
nIndex = round(logspace(1,log10(Nalloc),Nsamp)) ;
times = repmat({zeros(1,Nsamp)},1,Nfun) ;
% Array growth time-data
for k = 1:numel(funs)
f = funs{k};
for m = 1:Nsamp
tic;
f(x(1:nIndex(m)));
times{k}(m) = toc;
fprintf(['Iteration ',num2str(m,'%02G'),' of function ',num2str(k),' done.\n']);
end
end
% Plot
figure(1);
args(2:2:2*Nfun) = times;
args(1:2:2*Nfun) = repmat({nIndex},1,Nfun);
loglog(args{:});
legend('Dynamic Allocation (for)','Pre-Allocation (for)','Dyanmic Allocation (while)','Pre-Allocation (while)','Location','Northwest');
grid('on');
axis([nIndex(1),nIndex(end),0.95*min([times{:}]),1.05*max([times{:}])]);
xlabel('Number of Array Elements [-]');
ylabel('Elasped Time [s]');
% Switch to linear scale near allocation max and only look at for-functions
Nsamp = 50 ;
nIndex = round(10.^linspace(log10(Nalloc)-1.5,log10(Nalloc),Nsamp)) ;
mstart = repmat({zeros(1,Nsamp)},1,Nfun/2) ;
mend = repmat({zeros(1,Nsamp)},1,Nfun/2) ;
% Array growth memory-data
for k = 1:numel(funs)/2
f = funs{k};
for m = 1:Nsamp
[~,mstart{k}(m),mend{k}(m)] = f(x(1:nIndex(m)));
fprintf(['Iteration ',num2str(m,'%02G'),' of function ',num2str(k),' done.\n']);
end
end
% Plot
figure(2);
mem = [mstart,mend];
args(2:2:2*Nfun) = mem ;
args(1:2:2*Nfun) = repmat({nIndex},1,Nfun);
h = plot(args{:});
set(h([1,2]),'LineStyle','--');
set(h([1,3]),'Color','k');
set(h([2,4]),'Color','r');
legend('Dynamic Allocation Start','Pre-Allocation Start','Dynamic Allocation End','Pre-Allocation End','Location','Northwest');
grid('on');
axis([nIndex(1),nIndex(end),0.95*min([mem{:}]),1.05*max([mem{:}])]);
xlabel('Number of Array Elements [-]');
ylabel('Matlab Memory Usage [MB]');
end
function y = burden(x)
y = besselj(0,x);
end
function mem = getMemory()
mem = memory();
mem = mem.MemUsedMATLAB/1E6; %[MB]
end
function [y,mstart,mend] = forDynamic(x)
mstart = getMemory();
n = numel(x) ;
for k = 1:n
y(k) = burden(x(k));
end
mend = getMemory();
end
function [y,mstart,mend] = forAllocate(x)
mstart = getMemory();
n = numel(x) ;
y(1,n) = 0 ;
for k = 1:numel(x)
y(k) = burden(x(k));
end
mend = getMemory();
end
function [y,mstart,mend] = whileDynamic(x)
mstart = getMemory();
n = numel(x) ;
k = 1 ;
while k <= n
y(k) = burden(x(k)) ;
k = k + 1 ;
end
mend = getMemory();
end
function [y,mstart,mend] = whileAllocate(x)
mstart = getMemory();
n = numel(x) ;
k = 1 ;
y(1,n) = 0 ;
while k <= n
y(k) = burden(x(k)) ;
k = k + 1 ;
end
mend = getMemory();
end
Upvotes: 2
Reputation: 114320
MATLAB works with fixed-size memory chunks. When you create a new matrix, say with the zeros function, you allocate just enough for the matrix and its meta-data. If you look at the notation MATLAB uses to append to a matrix, it almost explains itself:
>> a = zeros(1, 3)
a =
0 0 0
>> a = [a 1]
a =
0 0 0 1
You created a new matrix [a 1]
(which by the way is an alias for horzcat), then stored it in a
. However, you did not need to store it back in a
, in which case you would have both matrices floating around in memory. This happens every time you concatenate matrices of any size.
Another indication of what happens is the matrix size listed in the variable inspector. If you notice, it never claims to allocate anything besides just enough space for the data and meta-data.
The purpose of this design is pretty straightforward. MATLAB works with matrices. Matrices normally do not change much, but they do often get used to create new matrices. Having fixed size arrays and a good allocation mechanism is therefore much more important that anticipating a user's desire to expand arrays. The MATLAB profiler will tell you as much if you try putting a matrix concatenation into a loop. Experimenting will show you that the profiler is not lying. It is simply not the purpose of the language.
All of the above also happens to apply to the numpy
arrays in Python, except that it is better documented.
On a related note, keep in mind that MATLAB is well-integrated with Java, so if you need to have a well managed, expandable array, you can use java.util.ArrayList directly in MATLAB.
Upvotes: 3
Reputation: 5672
I recall at the Matlab Expo a few years ago they had a talk about things under development at HQ - one of which was automatic pre-allocation of memory.
There was no mention as to when this might be released or even if it would ever be.... And I have never heard mention of it since...
From my experience - I have always managed the dynamic allocation myself - and always noticed serious slow down if that part of the code had issues (i.e. the arrays were growing when I thought they shouldn't be...)
So I think its fair to say that you need to manage it yourself.
Upvotes: 4