Reputation: 5579
I have 2 cell of strings and I would like to order them according to the first one.
A = {'a';'b';'c'}
B = {'b';'a';'c'}
idx = [2,1,3] % TO FIND
B=B(idx);
I would like to find a way to find idx
...
Upvotes: 2
Views: 66
Reputation: 112759
This seems to be significantly faster than using ismember
(although admittedly less clear than @rayryeng's answer). With thanks to @Divakar for his correction on this answer.
[~, indA] = sort(A);
[~, indB] = sort(B);
idx = indA(indB);
Upvotes: 2
Reputation: 104555
Use the second output of ismember
. ismember
tells you whether or not values in the first set are anywhere in the second set. The second output tells you where these values are located if we find anything. As such:
A = {'a';'b';'c'}
B = {'b';'a';'c'}
[~,idx] = ismember(A, B);
Note that there is a minor typo when you declared your cell arrays. You have a colon in between b
and c
for A
and a
and c
for B
. I placed a semi-colon there for both for correctness.
Therefore, we get:
idx =
2
1
3
We have three very good algorithms here. As such, let's see how this performs by doing a benchmarking test. What I'm going to do is generate a 10000 x 1 random character array of lower case letters. This will then be encapsulated into a 10000 x 1 cell array, where each cell is a single character array. I construct A
this way, and B
is a random permutation of the elements in A
. This is the code that I wrote to do this for us:
letters = char(97 + (0:25));
rng(123); %// Set seed for reproducibility
ind = randi(26, [10000, 1]);
lettersMat = letters(ind);
A = mat2cell(lettersMat, ones(10000,1), 1);
B = A(randperm(10000));
Now... here comes the testing code:
clear all;
close all;
letters = char(97 + (0:25));
rng(123); %// Set seed for reproducibility
ind = randi(26, [10000, 1]);
lettersMat = letters(ind);
A = mat2cell(lettersMat, 1, ones(10000,1));
B = A(randperm(10000));
tic;
[~,idx] = ismember(A,B);
t = toc;
fprintf('ismember: %f\n', t);
clear idx; %// Make sure test is unbiased
tic;
[~,idx] = max(bsxfun(@eq,char(A),char(B)'));
t = toc;
fprintf('bsxfun: %f\n', t);
clear idx; %// Make sure test is unbiased
tic;
[~, indA] = sort(A);
[~, indB] = sort(B);
idx = indB(indA);
t = toc;
fprintf('sort: %f\n', t);
This is what I get for timing:
ismember: 0.058947
bsxfun: 0.110809
sort: 0.006054
Luis Mendo's approach is the fastest, followed by ismember
, and then finally bsxfun
. For code compactness, ismember
is preferred but for performance, sort
is better. Personally, I think bsxfun
should win because it's such a nice function to use ;).
Upvotes: 3
Reputation: 221684
I had to jump in as it seems runtime performance could be a criteria here :)
Assuming that you are dealing with scalar strings(one character in each cell), here's my take that works even when you have not-commmon elements between A
and B
and uses the very powerful bsxfun
and as such I am really hoping this would be runtime-efficient -
[v,idx] = max(bsxfun(@eq,char(A),char(B)'));
idx = v.*idx
Example -
A =
'a' 'b' 'c' 'd'
B =
'b' 'a' 'c' 'e'
idx =
2 1 3 0
For a specific case when you have no not-common elements between A
and B
, it becomes a one-liner -
[~,idx] = max(bsxfun(@eq,char(A),char(B)'))
Example -
A =
'a' 'b' 'c'
B =
'b' 'a' 'c'
idx =
2 1 3
Upvotes: 2