bzak
bzak

Reputation: 563

How to avoid the loop to reduce the computation time of this code?

how to avoid the loop to reduce the computation time of this code (one solution of my last question):

I hope to find the column vectors of A(1:3,:) whose corresponding values in M(4,:) are not part of one of the vectors of the cell X (and obviously not equal to one of these vectors). I look for a fast solution if X is very large.

M = [1007  1007  4044  1007  4044  1007  5002 5002 5002 622 622;
      552   552   300   552   300   552   431  431  431 124 124; 
     2010  2010  1113  2010  1113  2010  1100 1100 1100  88  88;
        7    12    25    15    12    30     2   10   55  32  12];

Here I take directly A:

A = [1007  4044  5002  622;
      552   300   431  124;
     2010  1113  1100   88];

A contains unique column vectors of M(1:3,:)

X = {[2 5 68 44],[2 10 55 9 17],[1 55 6 7 8 9],[32 12]};

[~, ~, subs] = unique(M(1:3,:)','rows');

A4 = accumarray(subs(:),M(4,:).',[],@(x) {x});

%// getting a mask of which columns we want
idxC(length(A4)) = false;
for ii = 1:length(A4)
    idxC(ii) = ~any(cellfun(@(x) all(ismember(A4{ii},x)), X));
end

Displaying the columns we want

out = A(:,idxC)

Results:

>> out

out =

    1007        4044
     552         300
    2010        1113

the column vector [5002;431;1100] was eliminated because [2;10;55] is contained in X{2} = [2 10 55 9 17]

the column vector [622;124;88] was eliminated because [32 12] = X{4}

Another example: with the same X

    M = [1007  4044  1007  4044  1007  5002 5002 5002 622 622  1007  1007  1007;
          552   300   552   300   552   431  431  431 124 124   552    11    11; 
         2010  1113  2010  1113  2010  1100 1100 1100  88  88  2010    20    20;
           12    25    15    12    30     2   10   55  32  12     7    12     7];

X = {[2 5 68 44],[2 10 55 9 17],[1 55 6 7 8 9],[32 12]};

A = [1007  4044  5002  622  1077;
      552   300   431  124    11;
     2010  1113  1100   88    20];

Results: (with scmg answer)

I get if A sorted according to the first row: (correct result)

out =

         1007        1007        4044
           11         552         300
           20        2010        1113

if I do not sort the matrix A, I get: (false result)

out =

        4044        5002         622
         300         431         124
        1113        1100          88

the column vector A(:,4) = [622;124;88] should be eliminated because [32 12] = X{4}.

the column vector [5002;431;1100] should be eliminated because [2;10;55] is contained in X{2} = [2 10 55 9 17]

Upvotes: 14

Views: 544

Answers (4)

Divakar
Divakar

Reputation: 221524

Shot #1

Listed in this section is an approach that is supposed to be a quick and direct approach to solve our case. Please note that since A is the matrix of unique columns from M considering upto the third row, it is skipped here as the input because we generate it internally with the solution code. This is maintained in the next approach/shot as well. Here's the implementation -

function out = shot1_func(M,X)

%// Get unique columns and corresponding subscripts
[unqrows, ~, subs_idx] = unique(M(1:3,:)','rows');
unqcols = unqrows.'; %//'

counts = accumarray(subs_idx(:),1); %// Counts of each unique subs_idx

%// Modify each cell of X based on their relevance with the fourth row of M
X1 = cellfun(@(x) subs_idx(ismember(M(4,:),x)),X,'Uni',0);

lensX = cellfun('length',X1); %// Cell element count of X1

Xn = vertcat(X1{:}); %// Numeric array version of X
N = max(subs_idx);   %// Number of unique subs_idx

%// Finally, get decision mask to select the correst columns from unqcols
sums = cumsum(bsxfun(@eq,Xn,1:N),1);
cumsums_at_shifts = sums(cumsum(lensX),:);

mask1 = any(bsxfun(@eq,diff(cumsums_at_shifts,[],1),counts(:).'),1); %//'
decision_mask = mask1 | cumsums_at_shifts(1,:) == counts(:).';    %//'
out = unqcols(:,~decision_mask);

return

Shot #2

The earlier mentioned approach might have a bottleneck at :

cellfun(@(x) subs_idx(ismember(M4,x)),X,'Uni',0)

So, alternatively to keep performance as a good motivation, one can separate out the whole process into two stages. The first stage could take care of cells of X that are not repeated in the fourth row of M, which could be implemented with a vectorized approach and another stage solving for the rest of X's cells with our slower cellfun based approach.

Thus, the code would bloat out a bit, but hopefully would be better with performance. The final implementation would look something like this -

%// Get unique columns and corresponding subscripts
[unqrows, ~, subs_idx] = unique(M(1:3,:)','rows')
unqcols = unqrows.' %//'
counts = accumarray(subs_idx,1);

%// Form ID array for X
lX = cellfun('length',X)
X_id = zeros(1,sum(lX))
X_id([1 cumsum(lX(1:end-1)) + 1]) = 1
X_id = cumsum(X_id)

Xr = cellfun(@(x) x(:).',X,'Uni',0); %//'# Convert to cells of row vectors
X1 = [Xr{:}]                         %// Get numeric array version

%// Detect cells that are to be processed by part1 (vectorized code)
[valid,idx1] = ismember(M(4,:),X1)
p1v = ~ismember(1:max(X_id),unique(X_id(accumarray(idx1(valid).',1)>1))) %//'

X_part1 = Xr(p1v)
X_part2 = Xr(~p1v)

%// Get decision masks from first and second passes and thus the final output
N = size(unqcols,2);
dm1 = first_pass(X_part1,M(4,:),subs_idx,counts,N)
dm2 = second_pass(X_part2,M(4,:),subs_idx,counts)
out = unqcols(:,~dm1 & ~dm2)

Associated functions -

function decision_mask = first_pass(X,M4,subs_idx,counts,N)

lensX = cellfun('length',X)'; %//'# Get X cells lengths
X1 = [X{:}];                  %// Extract cell data from X

%// Finally, get the decision mask
vals = changem(X1,subs_idx,M4) .* ismember(X1,M4);

sums = cumsum(bsxfun(@eq,vals(:),1:N),1);
cumsums_at_shifts = sums(cumsum(lensX),:);
mask1 = any(bsxfun(@eq,diff(cumsums_at_shifts,[],1),counts(:).'),1); %//'
decision_mask = mask1 | cumsums_at_shifts(1,:) == counts(:).';    %//'
return


function decision_mask = second_pass(X,M4,subs_idx,counts)

%// Modify each cell of X based on their relevance with the fourth row of M
X1 = cellfun(@(x) subs_idx(ismember(M4,x)),X,'Uni',0);

lensX = cellfun('length',X1); %// Cell element count of X1

Xn = vertcat(X1{:}); %// Numeric array version of X
N = max(subs_idx);   %// Number of unique subs_idx

%// Finally, get decision mask to select the correst columns from unqcols
sums = cumsum(bsxfun(@eq,Xn,1:N),1);
cumsums_at_shifts = sums(cumsum(lensX),:);

mask1 = any(bsxfun(@eq,diff(cumsums_at_shifts,[],1),counts(:).'),1); %//'
decision_mask = mask1 | cumsums_at_shifts(1,:) == counts(:).';       %//'

return

Verficication

This section lists code to verify the output. Here's the code to do so to verify the shot #1 code -

%// Setup inputs and output
load('matrice_data.mat');   %// Load input data
X = cellfun(@(x) unique(x).',X,'Uni',0); %// Consider X's unique elements
out = shot1_func(M,X); %// output with Shot#1 function

%// Accumulate fourth row data from M based on the uniqueness from first 3 rows
[unqrows, ~, subs] = unique(M(1:3,:)','rows');    %//'
unqcols = unqrows.';                              %//'
M4 = accumarray(subs(:),M(4,:).',[],@(x) {x});    %//'
M4 = cellfun(@(x) unique(x),M4,'Uni',0);

%// Find out cells in M4 that correspond to unique columns unqcols
[unqcols_idx,~] = find(pdist2(unqcols.',out.')==0);

%// Finally, verify output
for ii = 1:numel(unqcols_idx)
    for jj = 1:numel(X)
        if all(ismember(M4{unqcols_idx(ii)},X{jj}))
            error('Error: Wrong output!')
        end
    end
end
disp('Success!')

Upvotes: 3

Ikaros
Ikaros

Reputation: 1048

The answer of Ben Voigt is great, but the line for A4i = A4{ii} is the one causing issues : the for loop doesn't work this way with column vectors :

%row vector
for i = 1:3
    disp('foo');
end

    foo
    foo
    foo

%column vector
for i = (1:3).'
    disp('foo');
end

    foo

Just try for A4i = A4{ii}.' instead and it should get your work done!

Now, if we look at the output :

A(:,idxC) =

    4044        5002
     300         431
    1113        1100

As you can see, the final result is not what we expected.

As long as unique does a kind of sort, the subs are not numbered by the order of encounter in A, but by order of encounter in C (which is sorted) :

subs =

 2
 2
 3
 2
 3
 2
 4
 4
 4
 1
 1

Therefore you should pass by the matrix given by unique rather than A to get your final output

Enter

[C, ~, subs] = unique(M(1:3,:)','rows'); 
%% rather than [~, ~, subs] = unique(M(1:3,:)','rows');

Then, to get the final output, enter

>> out = C(idxC,:).'
out =

        1007        4044
         552         300
        2010        1113

Upvotes: 4

Ben Voigt
Ben Voigt

Reputation: 283614

In this case, you should not be trying to eliminate loops. The vectorization is actually hurting you badly.

In particular (giving a name to your anonymous lambda)

issubset = @(x) all(ismember(A4{ii},x))

is ridiculously inefficient, because it doesn't short-circuit. Replace that with a loop.

Same for

any(cellfun(issubset, X))

Use an approach similar to this instead:

idxC = true(size(A4));
NX = numel(X);
for ii = 1:length(A4)
    for jj = 1:NX
        xj = X{jj};
        issubset = true;
        for A4i=A4{ii}
            if ~ismember(A4i, xj)
                issubset = false;
                break;
            end;
        end;
        if issubset
            idxC(ii) = false;
            break;
        end;
    end;
end;

The two break statements, and especially the second one, trigger an early exit that potentially saves you a huge amount of computation.

Upvotes: 4

scmg
scmg

Reputation: 1894

Maybe you can use 2 times cellfun:

idxC = cellfun(@(a) ~any(cellfun(@(x) all(ismember(a,x)), X)), A4, 'un', 0);
idxC = cell2mat(idxC);
out = A(:,idxC)

Upvotes: 1

Related Questions