Confounded
Confounded

Reputation: 522

MATLAB find cell array substrings in a cell array of strings

Let's say we have a cell array of substrings arrayOfSubstrings = {substr1;substr2} and a cell array of strings arrayOfStrings = {string1;string2;string3;stirng4}. How can I get a logical map into the cell array of strings where at least one of the substrings is found? I have tried

cellfun('isempty',regexp(arrayOfSubstrings ,arrayOfStrings ))

and

cellfun('isempty', strfind(arrayOfSubstrings , arrayOfStrings ))

and some other permutations of functions, but am not getting anywhere.

Upvotes: 1

Views: 432

Answers (2)

matlabbit
matlabbit

Reputation: 706

If you are using R2016b or R2017a you can just use contains:

>> strings = {'ab', 'bc', 'de', 'fa'};
>> substrs = {'a', 'b', 'c'};
>> contains(strings, substrs)

ans =

  1×4 logical array

   1   1   0   1

Contains is also the fastest, especially if you use the new string datatype.

function profFunc()

    strings = {'ab', 'bc', 'de', 'fa'};
    substrs = {'a', 'b', 'c'};

    n = 10000;

    tic;
    for i = 1:n
        substrs_translated = regexptranslate('escape', substrs);

        matches = false(size(strings));

        for k = 1:numel(strings)
            matches(k) = any(~cellfun('isempty', regexp(strings{k}, substrs_translated)));
        end
    end
    toc

    tic;
    for i = 1:n
        cellfun(@(s)any(~cellfun('isempty', regexp(s, substrs))), strings);
    end
    toc

    tic;
    for i = 1:n
        pattern = ['(', strjoin(regexptranslate('escape', substrs), '|'), ')'];
        output = ~cellfun('isempty', regexp(strings, pattern)); %#ok<NASGU>
    end
    toc

    tic;
    for i = 1:n
        contains(strings,substrs);
    end
    toc

    %Imagine you were using strings for all your text!
    strings = string(strings);

    tic;
    for i = 1:n
        contains(strings,substrs);
    end
    toc
end

Timing results:

>> profFunc
Elapsed time is 0.643176 seconds.
Elapsed time is 1.007309 seconds.
Elapsed time is 0.683643 seconds.
Elapsed time is 0.050663 seconds.
Elapsed time is 0.008177 seconds.

Upvotes: 1

Suever
Suever

Reputation: 65430

The issue is that with both strfind and regexp is that you can't provide two cell arrays and have them automatically apply all patterns to all strings. You will need to loop through one or the other to make it work.

You can do this with an explicit loop

strings = {'ab', 'bc', 'de', 'fa'};
substrs = {'a', 'b', 'c'};

% First you'll want to escape the regular expressions
substrs = regexptranslate('escape', substrs);

matches = false(size(strings));

for k = 1:numel(strings)
    matches(k) = any(~cellfun('isempty', regexp(strings{k}, substrs)));
end

% 1  1  0  1

Or if you are for loop-averse you can use cellfun

cellfun(@(s)any(~cellfun('isempty', regexp(s, substrs))), strings)
% 1  1  0  1

A Different Approach

Alternately, you could combine your sub-strings into a single regular expression

pattern = ['(', strjoin(regexptranslate('escape', substrs), '|'), ')'];
%   (a|b|c)

output = ~cellfun('isempty', regexp(strings, pattern));
%   1  1  0  1

Upvotes: 3

Related Questions