Reputation: 197
Multiplications of submatricies in matlab seems to be materially slower than multiplication of the matrix they are drawn from.
>> m = rand(1e7, 40);
>> tic; m' * m; toc % (1)
Elapsed time is 0.245803 seconds.
>> tic; m(1:2.5e6, :)' * m(1:2.5e6, :); toc % (2)
Elapsed time is 1.810981 seconds.
>> tic; t = m(1:2.5e6, :); t' * t; toc % (3)
Elapsed time is 0.885764 seconds.
I had hoped that there would be some fast builtin methods for this, since the data is already in memory, but it seems there's no way to prevent matlab from making intermediate copies. (3) is faster but garuntees that we make a copy.
Is there any technique in matlab to make operations (e.g. multiplcation, transpose) as quick on the subset of a matrix as they are the whole?
Is the only fast way to achieve this to with mex?
Edit: Using column major data speeds things up across the board (as expected), but is still much slower when multiplying the submatricies.
>> m = rand(40,1e7);
>> tic; m * m'; toc
Elapsed time is 0.251958 seconds.
>> tic; m(:, 1:2.5e6) * m(:, 1:2.5e6)'; toc
Elapsed time is 1.461321 seconds.
>> tic; s=m(:, 1:2.5e6); s * s'; toc
Elapsed time is 0.555667 seconds.
Clearly matlab is taking copies when indexing, but is there any way to prevent this (it's clear such a multiplication algorithm can exist without copying the data, I just want to know if it's expressible in matlab).
Upvotes: 3
Views: 182
Reputation: 60695
MATLAB does copy-on-write, meaning that copying a matrix doesn't really copy the data, at least until you try to change one of the two matrices that point to the same data. However, when you copy a part of a matrix, the data is always copied. That is, a matrix never points to a subset of values of a different matrix. MATLAB always stores data consecutively in memory (column-major ordering), and so it would rarely be the case that the part of the matrix that you index is consecutive in memory. One way to try this out is to use format debug
and look at data pointers:
>> format debug
>> q=rand(10,100)
q =
Structure address = 127e64560
m = 10
n = 100
pr = 7f9f5c835420
pi = 0
Columns 1 through 9
0.8147 0.1576 0.6557 0.7060 0.4387 0.2760 0.7513 0.8407 0.3517
...
>> q(:,1:2)
ans =
Structure address = 11dadd750
m = 10
n = 2
pr = 7f9f5b792700
pi = 0
0.8147 0.1576
..
Note how the pr
value (pointer to the real-valued portion of the data) changes. This shows that the data is copied.
If you cannot use vectorized code (i.e. apply the operation on the matrix as a whole), your best bet for speeding up computation is a MEX-file.
Upvotes: 1
Reputation: 2993
Here is more test results.
Five methods of performing matrix multiplication has been tested, 3 of which are involving submatrices. As @Cris has pointed out, there may be difference if using rand(1e7,40)
, so another whole set is tested.
A loop of 10 repeats and Profiler have been used to improve accuracy. Tested on a i7 with abundant RAM.
% clear;clc;close all
A = rand(1e7, 40);
for ii = 1:10
m = A;
m(end) = m(end-1);
m' * m; % 1
n = m';
n * m; % 2
m(1:2.5e6, :)' * m(1:2.5e6, :); % 3
t = m(1:2.5e6, :);
t' * t; % 4
t2 = t';
t2 * t; % 5
clearvars -except A
end
clearvars A
B = rand(40, 1e7);
for ii = 1:10
m = B;
m(end) = m(end-1);
m * m'; % 1
n = m';
m * n; % 2
m(:,1:2.5e6) * m(:,1:2.5e6)'; % 3
t = m(:,1:2.5e6);
t * t'; % 4
t2 = t';
t * t2; % 5
clearvars -except B
end
40x2.5e6
one is clearly faster. m'*m
it identifies the pattern and skips the transpose operation. Upvotes: 2