Reputation: 408
I have two matrices multiplied with each other H.Z where both matrices H and Z have the same size of (256,256). Matrix Z is permutation matrix has the following pattern: In the first 32 rows, only columns 1,9,17,...(256-8) are non-zeros, other columns are zeros, next 32 rows, only columns 2,10,18,...(256-7) are non-zeros, other columns are zeros and so on till the last 32 rows, where columns 8,16,24,....,256 are non-zeros and other columns are zeros.
Therefore, multiplying matrix H with Z includes only multiplying the first 32 elements of first row in H with the first 32 element of column 1 of matrix Z, then next 32 element of first rows of matrix H with next 32 element (33-64 elements) of column 2 in matrix Z and so on. because all other multiplications will result of zero. So in that way, the number of multiplication will be less.
My question, I couldn't write that in Matlab !! I don't know how create the loop to go through only the non-zeros elements. Could you please help in that?
Thank you in advance.
Upvotes: 0
Views: 367
Reputation: 1808
For loops are generally much slower than inbuilt MATLAB operations. A better options is to multiply only the nonzero elements of Z
using the following approach.
result = zeros(256,256);
result(Z ~= 0) = H(Z ~= 0) .* Z(Z ~= 0);
You can see the complete code below, running a test to make sure it gets the right answer, and timing the code to see if it's faster.
% setup variables
H = rand(256,256);
Z = zeros(256,256);
for i = 1:8
Z((i-1)*32+1:i*32, i:8:256) = 1;
end
% run calcuations and check that they are equal
HZ1 = f1(H, Z);
HZ2 = f2(H, Z);
are_equal = all(all(HZ1 == HZ2));
% time both functions
timeit(@() f1(H,Z))
timeit(@() f2(H,Z))
function result = f1(H, Z)
result = H .* Z;
end
function result = f2(H, Z)
result = zeros(256,256);
result(Z ~= 0) = H(Z ~= 0) .* Z(Z ~= 0);
end
Timeit results:
f1 - 6.875835e-05 s
f2 - 0.0008205853 s
Unfortunately, the new approach is about 12 times slower than just multiplying the matrices elementwise. This is because MATLAB is heavily optimised for matrix multiplication, and multiplying the complete matrices H
and Z
ensures the memory to be operated on is contiguous.
Upvotes: 3