Reputation: 5409
I have a cell array in Matlab:
strings = {'one', 'two', 'three'};
How can I efficiently calculate the length of all three strings? Right now I use a for loop:
lengths = zeros(3,1);
for i = 1:3
lengths(i) = length(strings{i});
end
This is however unusable slow when you have a large amount of strings (I've got 480,863 of them). Any suggestions?
Upvotes: 2
Views: 5264
Reputation: 20915
You can also use:
cellfun(@length, strings)
It will not be faster, but makes the code clearer.
Regarding the slowness, you should first run the profiler to check where the bottleneck is. Only then should you optimize.
Edit: I just recalled that 'length' used to be a built-in function in cellfun in older Matlab versions. So it might actually be faster! Try
cellfun('length',strings)
Edit(2) : I have to admit that my first answer was a wild guess. Following @Rodin s comment, I decided to check out the speedup.
Here is the code of the benchmark:
First, the code that generates a lot of strings and saves to disk:
function GenerateCellStrings()
strs = cell(1,10000);
for i=1:10000
strs{i} = GenerateRandomString();
end
save strs;
end
function st = GenerateRandomString()
MAX_STR_LENGTH = 1000;
n = randi(MAX_STR_LENGTH);
st = char(randi([97 122], 1,n ));
end
Then, the benchmark itself:
function CheckRunTime()
load strs;
tic;
disp('Loop:');
for i=1:numel(strs)
n = length(strs{i});
end
toc;
disp('cellfun (String):');
tic;
cellfun('length',strs);
toc;
disp('cellfun (function handle):');
tic;
cellfun(@length,strs);
toc;
end
And the results are:
Loop:
Elapsed time is 0.010663 seconds.
cellfun (String):
Elapsed time is 0.000313 seconds.
cellfun (function handle):
Elapsed time is 0.006280 seconds.
Wow!! The 'length' syntax is about 30 times faster than a loop! I can only guess why it becomes so fast. Maybe the fact that it recognizes length
specifically. Might be JIT optimization.
Edit(3) - I found out the reason for the speedup. It is indeed recognition of length
specifically. Thanks to @reve_etrange for the info.
Upvotes: 9
Reputation: 3968
Keep an array of the lengths of said strings, and update that array when you update the strings. This will allow you O(1) time access to string lengths. Since you are updating it at the same time you generate or load strings, it shouldn't slow things down much, since integer array operations are (generally) faster than string operations.
Upvotes: 2