Vivek Subramanian
Vivek Subramanian

Reputation: 1234

String comparison on cell array of strings with Matlab Coder

I am trying to use the MATLAB Coder toolbox to convert the following code into C:

function [idx] = list_iterator(compare, list)

idx = nan(length(list));
for j = 1:length(list)
    idx(j) = strcmp(compare, list{j});
end

list is an N x 1 cell array of strings and compare is a string. The code basically compares each element of list to compare and returns 1 if the two are the same and 0 otherwise. (I'm doing this to speed up execution because N can be quite large - around 10 to 20 million elements.)

When I run codegen list_iterator in the Command Window, I get the following error:

Type of input argument 'compare' for function 'list_iterator' not specified. Use -args or preconditioning statements to specify input types.

More information

Error in ==> list_iterator Line: 1 Column: 18

Code generation failed: View Error Report

Error using codegen

I know I'm supposed to specify the types of the inputs when using codegen, but I'm not sure how to do this for a cell array of strings, the elements of which can be of different length. The string compare can also have different lengths depending on the function call.

Upvotes: 1

Views: 599

Answers (1)

Ryan Livingston
Ryan Livingston

Reputation: 1928

You can use the function coder.typeof to specify variable-size inputs to codegen. From what I've understood of your example, something like:

>> compare = coder.typeof('a',[1,Inf])

compare = 

coder.PrimitiveType
   1×:inf char
>> list = coder.typeof({compare}, [Inf,1])

list = 

coder.CellType
   :inf×1 homogeneous cell 
      base: 1×:inf char
>> codegen list_iterator.m -args {compare, list}

seems appropriate.

If you check out the MATLAB Coder App, that provides a graphical means of specifying these complicated inputs. From there you can export this to a build script to see the corresponding command line APIs:

https://www.mathworks.com/help/coder/ug/generate-a-matlab-script-to-build-a-project.html?searchHighlight=build%20script&s_tid=doc_srchtitle

Note that when I tried this example with codegen, the resulting MEX was not faster than MATLAB. One reason this can happen is because the body of the function is fairly simple but a large amount of data is transferred from MATLAB to the generated code and back. As a result, this data transfer overhead can dominate the execution time. Moving more of your code to generated MEX may improve this.

Thinking about the performance unrelated to codegen, should you use idx = false(length(list),1); rather than idx = nan(length(list));? The former is a Nx1 logical vector while the latter is an NxN double matrix where we only write the fist column in list_iterator.

With your original code and the inputs compare = 'abcd'; list = repmat({'abcd';'a';'b'},1000,1); this gives the time:

>> timeit(@()list_iterator(compareIn, listIn))

ans =

    0.0257

Modifying your code to return a vector scales that down:

function [idx] = list_iterator(compare, list)

idx = false(length(list),1);
for j = 1:length(list)
    idx(j) = strcmp(compare, list{j});
end

>> timeit(@()list_iterator(compareIn, listIn))

ans =

    0.0014

You can also call strcmp with a cell and char array which makes the code faster still:

function [idx] = list_iterator(compare, list)

idx = strcmp(compare, list);

>> timeit(@()list_iterator(compareIn, listIn))

ans =

   2.1695e-05

Upvotes: 2

Related Questions