Reputation: 31

Extract data from multidimentional array into 2 dims based on index

I have a huge (1000000x100x7) matrix and i need to create a (1000000x100x1) matrix based on an index vector (100x1) which holds 1 2 3 4 5 6 or 7 for each location.

I do not want to use loops

Upvotes: 1

Answers (1)

KQS

Reputation: 1597

The problem (I think)

First, let me try create a minimum working example that I think captures what you want to do. You have a matrix A and an index vector index:

A = rand(1000000, 100, 7);
index = randi(7, [100, 1]);

And you would like to do something like this:

[I,J,K] = size(A);
B = zeros(I,J);
for i=1:I
    for j=1:J
        B(i,j) = A(i,j,index(j));
    end
end

Only you'd like to do so without the loops.

Linear indexing

One way to do this is by using linear indexing. This is kinda a tricky thing that depends on how the matrix is laid out in memory, and I'm gonna do a really terrible job explaining it, but you can also check out the documentation for the sub2ind and ind2sub functions.

Anyways, it means that given your (1,000,000 x 100 x 7) matrix stored in column-major format, you can refer to the same element in many different ways, i.e.:

A(i, j, k)
A(i, j + 100*(k-1))
A(i + 1000000*(j-1 + 100*(k-1)))

all refer to the same element of the matrix. Anyways, the punchline is:

linear_index = (1:J)' + J*(index-1);
B_noloop = A(:, linear_index);

And of course we should verify that this produces the same answer:

>> isequal(B, B_noloop)
ans =
     1

Yay!

Performance vs. readability

So testing this on my computer, the nested loops took 5.37 seconds and the no-loop version took 0.29 seconds. However, it's kinda hard to tell what's going on in that code. Perhaps a more reasonable compromise would be:

B_oneloop = zeros(I,J);
for j=1:J
    B_oneloop(:,j) = A(:,j,index(j));
end

which vectorizes the longest dimension of the matrix and thus gets most of the way there (0.43 seconds), but maintains the readability of the original code.

Upvotes: 3

Extract data from multidimentional array into 2 dims based on index

Answers (1)

The problem (I think)

Linear indexing

Performance vs. readability

Related Questions