Marijn van Vliet
Marijn van Vliet

Reputation: 5409

How to compute effectively string length of cell array of strings

I have a cell array in Matlab:

strings = {'one', 'two', 'three'};

How can I efficiently calculate the length of all three strings? Right now I use a for loop:

lengths = zeros(3,1);
for i = 1:3
    lengths(i) = length(strings{i});
end

This is however unusable slow when you have a large amount of strings (I've got 480,863 of them). Any suggestions?

Upvotes: 2

Views: 5264

Answers (2)

Andrey Rubshtein
Andrey Rubshtein

Reputation: 20915

You can also use:

cellfun(@length, strings)

It will not be faster, but makes the code clearer.
Regarding the slowness, you should first run the profiler to check where the bottleneck is. Only then should you optimize.

Edit: I just recalled that 'length' used to be a built-in function in cellfun in older Matlab versions. So it might actually be faster! Try

 cellfun('length',strings)

Edit(2) : I have to admit that my first answer was a wild guess. Following @Rodin s comment, I decided to check out the speedup.

Here is the code of the benchmark:

First, the code that generates a lot of strings and saves to disk:

function GenerateCellStrings()
    strs = cell(1,10000);
    for i=1:10000
        strs{i} = GenerateRandomString();
    end
    save strs;
end

function st = GenerateRandomString()
    MAX_STR_LENGTH = 1000;
    n = randi(MAX_STR_LENGTH);
    st = char(randi([97 122], 1,n ));

end

Then, the benchmark itself:

 function CheckRunTime()
    load strs;
    tic;
    disp('Loop:');
    for i=1:numel(strs)
        n = length(strs{i});
    end
    toc;

    disp('cellfun (String):');
    tic;
    cellfun('length',strs);
    toc;

    disp('cellfun (function handle):');
    tic;
    cellfun(@length,strs);
    toc;

end

And the results are:

Loop:
Elapsed time is 0.010663 seconds.
cellfun (String):
Elapsed time is 0.000313 seconds.
cellfun (function handle):
Elapsed time is 0.006280 seconds.

Wow!! The 'length' syntax is about 30 times faster than a loop! I can only guess why it becomes so fast. Maybe the fact that it recognizes length specifically. Might be JIT optimization.

Edit(3) - I found out the reason for the speedup. It is indeed recognition of length specifically. Thanks to @reve_etrange for the info.

Upvotes: 9

std''OrgnlDave
std''OrgnlDave

Reputation: 3968

Keep an array of the lengths of said strings, and update that array when you update the strings. This will allow you O(1) time access to string lengths. Since you are updating it at the same time you generate or load strings, it shouldn't slow things down much, since integer array operations are (generally) faster than string operations.

Upvotes: 2

Related Questions