Reputation: 121
So in Matlab, say I have a matrix X with size N by N, and i is an logical index vector with size 1 by N. Then I can do
sum(X(i,i))
The problem is it is equivalent to first allocating memory for
Y=X(i,i),
then computing the sum on Y, and deleting Y. Am I right? (Hoki's answer shows that it is right.)
Is there a faster way to compute the sum without (implicitly) creating Y? In case that Y is big, lots of time can be consumed in memory operations. In other words, is it possible to do something like the following:
S=zeros(1,nnz(i));
for k=find(i)
for j=find(i)
S(k)=S(k)+X(j,k);
end
end
In this way, all the memory we need aside from X is a vector S - we don't need to allocate memory for the big Y. Of course, the loop can be slow, but you get my idea.
Upvotes: 0
Views: 852
Reputation: 11792
You assume too much on how memory management is operated.
Timing:
I ran a benchmark with timeit. From N=10 to N=20000 there is absolutely no noticeable difference in the execution time of both forms.
Moreover, I still get the very same results if I turn the JIT acceleration off ... so the optimization may be just the result of Matlab lazy-copy
behaviour.
Memory usage:
On the memory side, there seem to be a difference though. The indirect method (with the temporary variable) seem to allocate the memory for this temporary variable (The size allocated correspond exactly to the size of the temporary variable). On the other hand, the direct method does not need any extra memory allocation to return a result.
This reaches the limits of my grasp on these things. I am not expert enough to pretend to explain why this difference in memory usage does not result in a timing difference. I know memory is fast but for high order of N
I thought it would have made a difference. Apparently not ...
More info:
For more details on Matlab memory management, I invite you to read this article from Loren at Matlab:
Memory Management for Functions and Variables
or if you want to read a more in-depth testing of the mechanism:
Internal Matlab memory optimizations
Benchmark result:
Benchmark code:
function ExecTimes = benchmark_sumcol
%// prepare logarithmic progression (up to what my 16GB RAM can take)
nOrder = (1:9).' * 10.^(1:3) ; nOrder = [nOrder(:) ; 10000 ; 20000] ; %'
npt = numel(nOrder) ;
ExecTimes = zeros( npt , 2 ) ;
for k = 1:npt
%// Sample data
N = nOrder(k) ;
X = rand(N) ;
ci = logical(randi([0 1],1,N)) ;
%// Benchmark
f1 = @() direct_sum(X,ci) ;
f2 = @() indirect_sum(X,ci) ;
ExecTimes(k,1) = timeit( f1 ) ;
ExecTimes(k,2) = timeit( f2 ) ;
clear X ci
disp(N)
end
function R = direct_sum(X,ci)
R = sum(X(:,ci)) ;
function R = indirect_sum(X,ci)
Y = X(:,ci) ;
R = sum(Y) ;
Code for memory benchmark
%% // set profiler options
clear all
profile('-memory','on');
setpref('profiler','showJitLines',1);
profile on
%% // sample data
N = 1000 ;
X = rand(N) ;
ci = logical(randi([0 1],1,N)) ;
%% // Benchmark
R2 = bench_indirect_sum(X,ci) ;
R1 = bench_direct_sum(X,ci) ;
%% // result
profile viewer
p = profile('info');
profsave(p,'profile_results')
I added your loop
version to the tests, although I had to rework it a bit to make it actually work (and give the same results than the others):
function R = bench_loop_sum(X,ci)
R = zeros(1,nnz(ci));
idxRes=1 ;
for k=find(ci)
for j=1:size(X,1)
R(idxRes)=R(idxRes)+X(j,k);
end
idxRes = idxRes+1 ;
end
The result is ok in terms of memory (that is no extra memory allocation for a temporary array), but is catastrophic in terms of speed:
And as we could expect with loops, even worse with the JIT off:
Now a simple change to suppress the inner loop makes things a lot better, but still a bit behind the direct way (note that this version do not allocate memory for a temporary column to do the sum):
function R = bench_loop_sum(X,ci)
R = zeros(1,nnz(ci));
idxRes=1 ;
for k=find(ci)
R(idxRes) = sum(X(:,k));
idxRes = idxRes+1 ;
end
with the JIT on.
Upvotes: 3
Reputation: 1056
There are two answers , if you will always be looking for complete columns, the answer is simple
t=sum(X);
is a row with the sum of all the columns
then
ans=sum(t(i))
is what you want.
if you might be looking for odd shapes linear indices might be what you are looking for.
See sub2ind
First create a linear index into the matrix (a 1D indexing) then use that index directly
usage sum over six items (5 through 10) in columns i
ind = sub2ind(size(X) , ones(6,1)* i , (5:10)'*ones(1,N)) ;
sum(X(ind))
Upvotes: 1