cloakedlearning
cloakedlearning

Reputation: 197

Matlab submatrix multiplication poor performance

Multiplications of submatricies in matlab seems to be materially slower than multiplication of the matrix they are drawn from.

>> m = rand(1e7, 40);
>> tic; m' * m; toc % (1)
Elapsed time is 0.245803 seconds.
>> tic; m(1:2.5e6, :)' * m(1:2.5e6, :); toc % (2)
Elapsed time is 1.810981 seconds.
>> tic; t = m(1:2.5e6, :); t' * t; toc % (3)
Elapsed time is 0.885764 seconds.

I had hoped that there would be some fast builtin methods for this, since the data is already in memory, but it seems there's no way to prevent matlab from making intermediate copies. (3) is faster but garuntees that we make a copy.

Is there any technique in matlab to make operations (e.g. multiplcation, transpose) as quick on the subset of a matrix as they are the whole?

Is the only fast way to achieve this to with mex?

Edit: Using column major data speeds things up across the board (as expected), but is still much slower when multiplying the submatricies.

>> m = rand(40,1e7);
>> tic; m * m'; toc
Elapsed time is 0.251958 seconds.
>> tic; m(:, 1:2.5e6) * m(:, 1:2.5e6)'; toc
Elapsed time is 1.461321 seconds.
>> tic; s=m(:, 1:2.5e6); s * s'; toc
Elapsed time is 0.555667 seconds.

Clearly matlab is taking copies when indexing, but is there any way to prevent this (it's clear such a multiplication algorithm can exist without copying the data, I just want to know if it's expressible in matlab).

Upvotes: 3

Views: 182

Answers (2)

Cris Luengo
Cris Luengo

Reputation: 60695

MATLAB does copy-on-write, meaning that copying a matrix doesn't really copy the data, at least until you try to change one of the two matrices that point to the same data. However, when you copy a part of a matrix, the data is always copied. That is, a matrix never points to a subset of values of a different matrix. MATLAB always stores data consecutively in memory (column-major ordering), and so it would rarely be the case that the part of the matrix that you index is consecutive in memory. One way to try this out is to use format debug and look at data pointers:

>> format debug
>> q=rand(10,100)
q =
Structure address = 127e64560
m = 10
n = 100
pr = 7f9f5c835420
pi = 0
  Columns 1 through 9
    0.8147    0.1576    0.6557    0.7060    0.4387    0.2760    0.7513    0.8407    0.3517
    ...

>> q(:,1:2)
ans =
Structure address = 11dadd750
m = 10
n = 2
pr = 7f9f5b792700
pi = 0
  0.8147    0.1576
  ..

Note how the pr value (pointer to the real-valued portion of the data) changes. This shows that the data is copied.

If you cannot use vectorized code (i.e. apply the operation on the matrix as a whole), your best bet for speeding up computation is a MEX-file.

Upvotes: 1

Yvon
Yvon

Reputation: 2993

Here is more test results.

Five methods of performing matrix multiplication has been tested, 3 of which are involving submatrices. As @Cris has pointed out, there may be difference if using rand(1e7,40), so another whole set is tested.

A loop of 10 repeats and Profiler have been used to improve accuracy. Tested on a i7 with abundant RAM.

Test code

% clear;clc;close all

A = rand(1e7, 40);
for ii = 1:10
    m = A;
    m(end) = m(end-1);

    m' * m; % 1

    n = m';
    n * m; % 2

    m(1:2.5e6, :)' * m(1:2.5e6, :); % 3

    t = m(1:2.5e6, :);
    t' * t; % 4

    t2 = t';
    t2 * t; % 5

    clearvars -except A
end
clearvars A
B = rand(40, 1e7);
for ii = 1:10
    m = B;
    m(end) = m(end-1);

    m * m'; % 1

    n = m';
    m * n; % 2

    m(:,1:2.5e6) * m(:,1:2.5e6)'; % 3

    t = m(:,1:2.5e6);
    t * t'; % 4

    t2 = t';
    t * t2; % 5

    clearvars -except B
end

Profiler result

Total time for each method

Profiler screenshot

Comments

  1. Any time difference less than 10% is considered identical.
  2. At the start of each loop, the matrix is forced to copy by a trivial assignment. The copy is pretty slow (11.5 sec).
  3. For the complete matrix, no difference is found between MxN and NxM dimensions. However for the sub-matrix test, the 40x2.5e6 one is clearly faster.
  4. Buffering the transpose slows down the entire process by 100%. I think it's because Matlab loses ability to optimize; probably if one types m'*m it identifies the pattern and skips the transpose operation.
  5. Subsetting and then transposing is not optimized.
  6. Buffering the subset does speed up the multiplication.

Upvotes: 2

Related Questions