Reputation: 221
Given a matrix A
, I need to multiply with other n
vectors Bi
(i.e. i=1...n
). The size of A
can be like 5000x5000
and thus Bi
like 5000x1
.
If I evaluate the product in the following way:
for i=1:n
product=A*Bi;
% do something with product
end
The result is way (orders of magnitude) slower than computing the products like:
%assume that S is a matrix that contains the vectors Bi as columns, i.e. S(:,i)=Bi, then:
results=A*S; %stores all the products in matrix form
% do something with results
The problem is that the number n
of vectors Bi
may be too big to be stored in memory, for example n=300000
, so I need to use a loop approach where each time I evaluate the product, use it and then discard the vector Bi
.
Why is such an approach so slow compared to direct multiplication, and are there ways to overcome this?
Upvotes: 5
Views: 1153
Reputation: 66
In order to do the multiplications of each array with the matrix just multiplicate the matrix with one matrix that will have as columns the arrays you want.
So if you want to check this
if
size(a)=3,3
then
a*b==horzcat(a*b(:,1),a*b(:,2),a*b(:,3))
is true
In this way you save a lot of time from loop
Upvotes: 0
Reputation: 18177
In addition to @Dan's answer you might try going parallel, provided you have enough cores and large enough operations to make it profitable (see this answer for more details on memory consumption of parfor
):
parfor ii = 0:(n/k)-1
product = A*S(:,(ii*k+1):(ii+1)*k)
end
I can't see in the docs on mtimes
(the *
operator) whether it is implicitly multithreaded, but it's worth a shot I guess.
Upvotes: 1
Reputation: 45741
You could try loop over batches so for example
for i = 0:(n/k)-1
product = A*S(:,(i*k+1):(i+1)*k)
end
And adjust k
to find the best trade off of speed and memory for you.
MATLAB's loops are slow because it is an interpreted language. So it has to work out a lot of stuff on the fly. The loops are greatly improved these days because of the JIT compiler, but they are still slow compared to the built-in functions which are written and compiled in C. Further to that, they use really cutting edge super fast matrix multiplication algorithms, as compared with your fairly naive algorithm achieved by looping, which also aid in the speed-up you are experiencing.
Upvotes: 4
Reputation: 36710
For simplicity my answer will assume a n-by-n square matrix A, but it is true for non squares as well.
Your loop approach uses matrix vector multiplication. The naive solution is also the best known, resulting in a runtime of O(n^2) which is repeated n times. You end up with a total runtime of O(n^3).
For matrix multiplication, there is a better approach. The best known algorithm only requires little less than O(n^2.4) runtime, making it much faster for large number.
You will achieve a better runtime when multiplying multiple vectors Bi at once using matrix multiplication. This will not achieve the performance of a pure matrix multiplication, but working with larger slices of b is probably the fastest memory efficient solution.
Some Code for the different discussed approaches:
n=5000;
k=100;
A=rand(n,n);
S=rand(n,n);
workers=matlabpool('size');
%for a parfor solution, the batch size must be smaller because multiple batches are stred in memory at once
kparallel=k/workers;
disp('simple loop:');
tic;
for i = 1:n
product = A*S(:,n);
end
toc
disp('batched loop:');
tic;
for i = 1:(n/k)
product = A*S(:,(i-1)*k+1:(i)*k);
end
toc
disp('batched parfor loop:');
tic;
parfor i = 1:(n/kparallel)
product = A*S(:,(i-1)*kparallel+1:(i)*kparallel);
end
toc
disp('matrix multiplication:');
tic;
A*S;
toc
Upvotes: 3