Reputation: 563
how to avoid the loop to reduce the computation time of this code (one solution of my last question):
I hope to find the column vectors of A(1:3,:)
whose corresponding values in M(4,:)
are not part of one of the vectors of the cell X
(and obviously not equal to one of these vectors). I look for a fast solution if X
is very large.
M = [1007 1007 4044 1007 4044 1007 5002 5002 5002 622 622;
552 552 300 552 300 552 431 431 431 124 124;
2010 2010 1113 2010 1113 2010 1100 1100 1100 88 88;
7 12 25 15 12 30 2 10 55 32 12];
Here I take directly A
:
A = [1007 4044 5002 622;
552 300 431 124;
2010 1113 1100 88];
A
contains unique column vectors of M(1:3,:)
X = {[2 5 68 44],[2 10 55 9 17],[1 55 6 7 8 9],[32 12]};
[~, ~, subs] = unique(M(1:3,:)','rows');
A4 = accumarray(subs(:),M(4,:).',[],@(x) {x});
%// getting a mask of which columns we want
idxC(length(A4)) = false;
for ii = 1:length(A4)
idxC(ii) = ~any(cellfun(@(x) all(ismember(A4{ii},x)), X));
end
Displaying the columns we want
out = A(:,idxC)
Results:
>> out
out =
1007 4044
552 300
2010 1113
the column vector [5002;431;1100]
was eliminated because [2;10;55]
is contained in X{2} = [2 10 55 9 17]
the column vector [622;124;88]
was eliminated because [32 12] = X{4}
Another example: with the same X
M = [1007 4044 1007 4044 1007 5002 5002 5002 622 622 1007 1007 1007;
552 300 552 300 552 431 431 431 124 124 552 11 11;
2010 1113 2010 1113 2010 1100 1100 1100 88 88 2010 20 20;
12 25 15 12 30 2 10 55 32 12 7 12 7];
X = {[2 5 68 44],[2 10 55 9 17],[1 55 6 7 8 9],[32 12]};
A = [1007 4044 5002 622 1077;
552 300 431 124 11;
2010 1113 1100 88 20];
Results: (with scmg answer)
I get if A
sorted according to the first row: (correct result)
out =
1007 1007 4044
11 552 300
20 2010 1113
if I do not sort the matrix A
, I get: (false result)
out =
4044 5002 622
300 431 124
1113 1100 88
the column vector A(:,4) = [622;124;88]
should be eliminated because [32 12] = X{4}
.
the column vector [5002;431;1100]
should be eliminated because [2;10;55]
is contained in X{2} = [2 10 55 9 17]
Upvotes: 14
Views: 544
Reputation: 221524
Shot #1
Listed in this section is an approach that is supposed to be a quick and direct approach to solve our case. Please note that since A
is the matrix of unique columns from M
considering upto the third row, it is skipped here as the input because we generate it internally with the solution code. This is maintained in the next approach/shot as well. Here's the implementation -
function out = shot1_func(M,X)
%// Get unique columns and corresponding subscripts
[unqrows, ~, subs_idx] = unique(M(1:3,:)','rows');
unqcols = unqrows.'; %//'
counts = accumarray(subs_idx(:),1); %// Counts of each unique subs_idx
%// Modify each cell of X based on their relevance with the fourth row of M
X1 = cellfun(@(x) subs_idx(ismember(M(4,:),x)),X,'Uni',0);
lensX = cellfun('length',X1); %// Cell element count of X1
Xn = vertcat(X1{:}); %// Numeric array version of X
N = max(subs_idx); %// Number of unique subs_idx
%// Finally, get decision mask to select the correst columns from unqcols
sums = cumsum(bsxfun(@eq,Xn,1:N),1);
cumsums_at_shifts = sums(cumsum(lensX),:);
mask1 = any(bsxfun(@eq,diff(cumsums_at_shifts,[],1),counts(:).'),1); %//'
decision_mask = mask1 | cumsums_at_shifts(1,:) == counts(:).'; %//'
out = unqcols(:,~decision_mask);
return
Shot #2
The earlier mentioned approach might have a bottleneck at :
cellfun(@(x) subs_idx(ismember(M4,x)),X,'Uni',0)
So, alternatively to keep performance as a good motivation, one can separate out the whole process into two stages. The first stage could take care of cells of X
that are not repeated in the fourth row of M
, which could be implemented with a vectorized approach and another stage solving for the rest of X's
cells with our slower cellfun
based approach.
Thus, the code would bloat out a bit, but hopefully would be better with performance. The final implementation would look something like this -
%// Get unique columns and corresponding subscripts
[unqrows, ~, subs_idx] = unique(M(1:3,:)','rows')
unqcols = unqrows.' %//'
counts = accumarray(subs_idx,1);
%// Form ID array for X
lX = cellfun('length',X)
X_id = zeros(1,sum(lX))
X_id([1 cumsum(lX(1:end-1)) + 1]) = 1
X_id = cumsum(X_id)
Xr = cellfun(@(x) x(:).',X,'Uni',0); %//'# Convert to cells of row vectors
X1 = [Xr{:}] %// Get numeric array version
%// Detect cells that are to be processed by part1 (vectorized code)
[valid,idx1] = ismember(M(4,:),X1)
p1v = ~ismember(1:max(X_id),unique(X_id(accumarray(idx1(valid).',1)>1))) %//'
X_part1 = Xr(p1v)
X_part2 = Xr(~p1v)
%// Get decision masks from first and second passes and thus the final output
N = size(unqcols,2);
dm1 = first_pass(X_part1,M(4,:),subs_idx,counts,N)
dm2 = second_pass(X_part2,M(4,:),subs_idx,counts)
out = unqcols(:,~dm1 & ~dm2)
Associated functions -
function decision_mask = first_pass(X,M4,subs_idx,counts,N)
lensX = cellfun('length',X)'; %//'# Get X cells lengths
X1 = [X{:}]; %// Extract cell data from X
%// Finally, get the decision mask
vals = changem(X1,subs_idx,M4) .* ismember(X1,M4);
sums = cumsum(bsxfun(@eq,vals(:),1:N),1);
cumsums_at_shifts = sums(cumsum(lensX),:);
mask1 = any(bsxfun(@eq,diff(cumsums_at_shifts,[],1),counts(:).'),1); %//'
decision_mask = mask1 | cumsums_at_shifts(1,:) == counts(:).'; %//'
return
function decision_mask = second_pass(X,M4,subs_idx,counts)
%// Modify each cell of X based on their relevance with the fourth row of M
X1 = cellfun(@(x) subs_idx(ismember(M4,x)),X,'Uni',0);
lensX = cellfun('length',X1); %// Cell element count of X1
Xn = vertcat(X1{:}); %// Numeric array version of X
N = max(subs_idx); %// Number of unique subs_idx
%// Finally, get decision mask to select the correst columns from unqcols
sums = cumsum(bsxfun(@eq,Xn,1:N),1);
cumsums_at_shifts = sums(cumsum(lensX),:);
mask1 = any(bsxfun(@eq,diff(cumsums_at_shifts,[],1),counts(:).'),1); %//'
decision_mask = mask1 | cumsums_at_shifts(1,:) == counts(:).'; %//'
return
Verficication
This section lists code to verify the output. Here's the code to do so to verify the shot #1 code -
%// Setup inputs and output
load('matrice_data.mat'); %// Load input data
X = cellfun(@(x) unique(x).',X,'Uni',0); %// Consider X's unique elements
out = shot1_func(M,X); %// output with Shot#1 function
%// Accumulate fourth row data from M based on the uniqueness from first 3 rows
[unqrows, ~, subs] = unique(M(1:3,:)','rows'); %//'
unqcols = unqrows.'; %//'
M4 = accumarray(subs(:),M(4,:).',[],@(x) {x}); %//'
M4 = cellfun(@(x) unique(x),M4,'Uni',0);
%// Find out cells in M4 that correspond to unique columns unqcols
[unqcols_idx,~] = find(pdist2(unqcols.',out.')==0);
%// Finally, verify output
for ii = 1:numel(unqcols_idx)
for jj = 1:numel(X)
if all(ismember(M4{unqcols_idx(ii)},X{jj}))
error('Error: Wrong output!')
end
end
end
disp('Success!')
Upvotes: 3
Reputation: 1048
The answer of Ben Voigt is great, but the line for A4i = A4{ii}
is the one causing issues : the for loop doesn't work this way with column vectors :
%row vector
for i = 1:3
disp('foo');
end
foo
foo
foo
%column vector
for i = (1:3).'
disp('foo');
end
foo
Just try for A4i = A4{ii}.'
instead and it should get your work done!
Now, if we look at the output :
A(:,idxC) =
4044 5002
300 431
1113 1100
As you can see, the final result is not what we expected.
As long as unique
does a kind of sort, the subs are not numbered by the order of encounter in A, but by order of encounter in C (which is sorted) :
subs =
2
2
3
2
3
2
4
4
4
1
1
Therefore you should pass by the matrix given by unique
rather than A to get your final output
Enter
[C, ~, subs] = unique(M(1:3,:)','rows');
%% rather than [~, ~, subs] = unique(M(1:3,:)','rows');
Then, to get the final output, enter
>> out = C(idxC,:).'
out =
1007 4044
552 300
2010 1113
Upvotes: 4
Reputation: 283614
In this case, you should not be trying to eliminate loops. The vectorization is actually hurting you badly.
In particular (giving a name to your anonymous lambda)
issubset = @(x) all(ismember(A4{ii},x))
is ridiculously inefficient, because it doesn't short-circuit. Replace that with a loop.
Same for
any(cellfun(issubset, X))
Use an approach similar to this instead:
idxC = true(size(A4));
NX = numel(X);
for ii = 1:length(A4)
for jj = 1:NX
xj = X{jj};
issubset = true;
for A4i=A4{ii}
if ~ismember(A4i, xj)
issubset = false;
break;
end;
end;
if issubset
idxC(ii) = false;
break;
end;
end;
end;
The two break
statements, and especially the second one, trigger an early exit that potentially saves you a huge amount of computation.
Upvotes: 4
Reputation: 1894
Maybe you can use 2 times cellfun
:
idxC = cellfun(@(a) ~any(cellfun(@(x) all(ismember(a,x)), X)), A4, 'un', 0);
idxC = cell2mat(idxC);
out = A(:,idxC)
Upvotes: 1