Reputation: 5579

Matlab, order cells of strings according to the first one

I have 2 cell of strings and I would like to order them according to the first one.

A = {'a';'b';'c'}
B = {'b';'a';'c'}

idx = [2,1,3] % TO FIND

B=B(idx);

I would like to find a way to find idx...

Upvotes: 2

Answers (3)

Luis Mendo

Reputation: 112759

This seems to be significantly faster than using ismember (although admittedly less clear than @rayryeng's answer). With thanks to @Divakar for his correction on this answer.

[~, indA] = sort(A);
[~, indB] = sort(B);
idx = indA(indB);

Upvotes: 2

rayryeng

Reputation: 104555

Use the second output of ismember. ismember tells you whether or not values in the first set are anywhere in the second set. The second output tells you where these values are located if we find anything. As such:

A = {'a';'b';'c'}
B = {'b';'a';'c'}
[~,idx] = ismember(A, B);

Note that there is a minor typo when you declared your cell arrays. You have a colon in between b and c for A and a and c for B. I placed a semi-colon there for both for correctness.

Therefore, we get:

idx =

  2
  1
  3

Benchmarking

We have three very good algorithms here. As such, let's see how this performs by doing a benchmarking test. What I'm going to do is generate a 10000 x 1 random character array of lower case letters. This will then be encapsulated into a 10000 x 1 cell array, where each cell is a single character array. I construct A this way, and B is a random permutation of the elements in A. This is the code that I wrote to do this for us:

letters = char(97 + (0:25));
rng(123); %// Set seed for reproducibility
ind = randi(26, [10000, 1]);
lettersMat = letters(ind);
A = mat2cell(lettersMat, ones(10000,1), 1);
B = A(randperm(10000));

Now... here comes the testing code:

clear all;
close all;

letters = char(97 + (0:25));
rng(123); %// Set seed for reproducibility
ind = randi(26, [10000, 1]);
lettersMat = letters(ind);
A = mat2cell(lettersMat, 1, ones(10000,1));
B = A(randperm(10000));

tic;
[~,idx] = ismember(A,B);
t = toc;

fprintf('ismember: %f\n', t);

clear idx; %// Make sure test is unbiased

tic;
[~,idx] = max(bsxfun(@eq,char(A),char(B)'));
t = toc;

fprintf('bsxfun: %f\n', t);

clear idx; %// Make sure test is unbiased

tic;
[~, indA] = sort(A);
[~, indB] = sort(B);
idx = indB(indA);
t = toc;

fprintf('sort: %f\n', t);

This is what I get for timing:

ismember: 0.058947
bsxfun: 0.110809
sort: 0.006054

Luis Mendo's approach is the fastest, followed by ismember, and then finally bsxfun. For code compactness, ismember is preferred but for performance, sort is better. Personally, I think bsxfun should win because it's such a nice function to use ;).

Upvotes: 3

Divakar

Reputation: 221684

I had to jump in as it seems runtime performance could be a criteria here :)

Assuming that you are dealing with scalar strings(one character in each cell), here's my take that works even when you have not-commmon elements between A and B and uses the very powerful bsxfun and as such I am really hoping this would be runtime-efficient -

[v,idx] = max(bsxfun(@eq,char(A),char(B)'));
idx = v.*idx

Example -

A = 
    'a'    'b'    'c'    'd'
B = 
    'b'    'a'    'c'    'e'
idx =
     2     1     3     0

For a specific case when you have no not-common elements between A and B, it becomes a one-liner -

[~,idx] = max(bsxfun(@eq,char(A),char(B)'))

Example -

A = 
    'a'    'b'    'c'
B = 
    'b'    'a'    'c'
idx =
     2     1     3

Upvotes: 2

Matlab, order cells of strings according to the first one

Answers (3)

Benchmarking

Related Questions