Reputation: 1140
Let strCellArr be a 4000*1 array of cells. Each cell is a string.
What is the fastest way to tell if every cell has a string of length 100.
In other words, I want something that does the same thing as
a= true;
For (i =0; i =length(strCellArr); i++)
if length(strCellArr{i}) ~= 100
a = false;
end
end
A related question:
I can convert the array to a 4000*100 array of characters with
charArr = char(strCellArr);
However, this will introduce whitespaces in lines that do not have 100 characters. So if line 34 only has 30 characters. Then
charArr(34)(50)
will return a whitespace.
How do I check that every single character is only certain characters, in my case (A, T, C, or G). Is there a way to do it without using a for loop?
Upvotes: 2
Views: 186
Reputation: 38032
Ooooh I just love questions that start like "what is the fastest way to do ..."
Here's a few alternatives, and a comparison:
% Initalize
map = 'CATG';
strCellArr = cellfun(@(x) map(randi(4,100,1)),cell(4000,1), 'UniformOutput', false);
% Your original method
tic
a = true;
for el = strCellArr
if length(el{1}) ~= 100
a = false;
break;
end
end
toc
% My solution
tic
a = all(cellfun('length', strCellArr) == 100);
toc
% Dang Khoa's method
tic
a = all( cellfun(@(x) length(x) == 100, strCellArr) );
toc
% Engineero's method
tic
a = all(cellfun(@length, strCellArr) == 100);
toc
Results:
Elapsed time is 0.001158 seconds. % loop
Elapsed time is 0.000455 seconds. % cellfun; string argument
Elapsed time is 0.031897 seconds. % cellfun; anonymous function
Elapsed time is 0.006994 seconds. % cellfun; function handle
Little known fact: the string inputs to cellfun
refer to functions built directly into the cellfun
binary, and therefore do not require the evaluation of an anonymous function. In other words, cellfun
does not have to make a pass through the MATLAB interpreter on every of its iterations, making this raging fast :)
Now, the second part of your question:
% Engineero
tic
A = 'ATCG';
all(all(ismember(char(strCellArr), A)));
toc
% My solution
tic
C = char(strCellArr);
~any(C(:)==' ');
toc
Results:
Elapsed time is 0.061168 seconds. % ismember
Elapsed time is 0.005098 seconds. % direct comparison to whitespace
This difference arises because ismember
is implemented in MATLAB m-code, and is riddled with code intended for user-friendliness (error checks, errors, warnings, etc.), elaborate generalizations, loop structures, and many other things that are all together a performance penalty.
Since we know beforehand that only spaces will be added to the array upon casting it to char
, we do not have to explicitly check for occurrences of 'A'
, 'C'
, 'T'
, 'G'
, but only for their abscence. Meaning, just look for those spaces :)
Needless to say, these times are all virtually negligible, and this is all more mental masturbation than actually really useful. But its fun! :)
Upvotes: 6
Reputation: 5823
For your first question: this sounds like a job for cellfun
. cellfun
lets you operate on each cell in a cell array. (As an aside, arrayfun
lets you do the same but on a regular array). (Note that your original code is not MATLAB syntax, particularly the for
loop.)
So you could do something like
res = cellfun(@(x) length(x) == 100, strCellArr);
Here, res
will be a logical
since the ==
condition will evaluate to 0 or 1. Then, you can see if all
of the results are 1, i.e., all the strings in strCellArr
are of length 100:
a = all(res);
if a == 0
disp('One or more strings does not have 100 characters!');
else
disp('All strings have 100 characters!');
end
Upvotes: 3
Reputation: 12908
For your first question, you could use all(cellfun(@length, strCellArr) == 100)
, which will return a 1 for "true" if every element in the cell has a length of 100 elements.
For your second question, you may be able to use all(ismember(charArr, A))
where A = ['A', 'T', 'C', 'G']
. See the documentation on all and ismember for more info.
Upvotes: 3