Comparing cells in matlab

Let strCellArr be a 4000*1 array of cells. Each cell is a string.

What is the fastest way to tell if every cell has a string of length 100.

In other words, I want something that does the same thing as

a= true;
For (i =0; i =length(strCellArr); i++)
    if length(strCellArr{i}) ~= 100
         a = false;
    end
end

A related question:

I can convert the array to a 4000*100 array of characters with

charArr = char(strCellArr);

However, this will introduce whitespaces in lines that do not have 100 characters. So if line 34 only has 30 characters. Then

charArr(34)(50)

will return a whitespace.

How do I check that every single character is only certain characters, in my case (A, T, C, or G). Is there a way to do it without using a for loop?

Upvotes: 2

Views: 186

Answers (3)

Rody Oldenhuis
Rody Oldenhuis

Reputation: 38032

Ooooh I just love questions that start like "what is the fastest way to do ..."

Here's a few alternatives, and a comparison:

% Initalize

map = 'CATG';     
strCellArr = cellfun(@(x) map(randi(4,100,1)),cell(4000,1), 'UniformOutput', false);


% Your original method 
tic
a = true;
for el = strCellArr
    if length(el{1}) ~= 100
         a = false;
         break;
    end
end
toc

% My solution 
tic
a = all(cellfun('length', strCellArr) == 100);
toc

% Dang Khoa's method
tic
a = all( cellfun(@(x) length(x) == 100, strCellArr) );
toc

% Engineero's method
tic
a = all(cellfun(@length, strCellArr) == 100);
toc

Results:

Elapsed time is 0.001158 seconds. % loop
Elapsed time is 0.000455 seconds. % cellfun; string argument
Elapsed time is 0.031897 seconds. % cellfun; anonymous function
Elapsed time is 0.006994 seconds. % cellfun; function handle

Little known fact: the string inputs to cellfun refer to functions built directly into the cellfun binary, and therefore do not require the evaluation of an anonymous function. In other words, cellfun does not have to make a pass through the MATLAB interpreter on every of its iterations, making this raging fast :)

Now, the second part of your question:

% Engineero
tic
A = 'ATCG';
all(all(ismember(char(strCellArr), A)));
toc

% My solution 
tic
C = char(strCellArr);
~any(C(:)==' ');
toc

Results:

Elapsed time is 0.061168 seconds. % ismember
Elapsed time is 0.005098 seconds. % direct comparison to whitespace

This difference arises because ismember is implemented in MATLAB m-code, and is riddled with code intended for user-friendliness (error checks, errors, warnings, etc.), elaborate generalizations, loop structures, and many other things that are all together a performance penalty.

Since we know beforehand that only spaces will be added to the array upon casting it to char, we do not have to explicitly check for occurrences of 'A', 'C', 'T', 'G', but only for their abscence. Meaning, just look for those spaces :)

Needless to say, these times are all virtually negligible, and this is all more mental masturbation than actually really useful. But its fun! :)

Upvotes: 6

Dang Khoa
Dang Khoa

Reputation: 5823

For your first question: this sounds like a job for cellfun. cellfun lets you operate on each cell in a cell array. (As an aside, arrayfun lets you do the same but on a regular array). (Note that your original code is not MATLAB syntax, particularly the for loop.)

So you could do something like

res = cellfun(@(x) length(x) == 100, strCellArr);

Here, res will be a logical since the == condition will evaluate to 0 or 1. Then, you can see if all of the results are 1, i.e., all the strings in strCellArr are of length 100:

a = all(res);
if a == 0
    disp('One or more strings does not have 100 characters!');
else
    disp('All strings have 100 characters!');
end

Upvotes: 3

Engineero
Engineero

Reputation: 12908

For your first question, you could use all(cellfun(@length, strCellArr) == 100), which will return a 1 for "true" if every element in the cell has a length of 100 elements.

For your second question, you may be able to use all(ismember(charArr, A)) where A = ['A', 'T', 'C', 'G']. See the documentation on all and ismember for more info.

Upvotes: 3

Related Questions