Reputation: 522
Let's say we have a cell array of substrings arrayOfSubstrings = {substr1;substr2}
and a cell array of strings arrayOfStrings = {string1;string2;string3;stirng4}
. How can I get a logical map into the cell array of strings where at least one of the substrings is found? I have tried
cellfun('isempty',regexp(arrayOfSubstrings ,arrayOfStrings ))
and
cellfun('isempty', strfind(arrayOfSubstrings , arrayOfStrings ))
and some other permutations of functions, but am not getting anywhere.
Upvotes: 1
Views: 432
Reputation: 706
If you are using R2016b or R2017a you can just use contains:
>> strings = {'ab', 'bc', 'de', 'fa'};
>> substrs = {'a', 'b', 'c'};
>> contains(strings, substrs)
ans =
1×4 logical array
1 1 0 1
Contains is also the fastest, especially if you use the new string datatype.
function profFunc()
strings = {'ab', 'bc', 'de', 'fa'};
substrs = {'a', 'b', 'c'};
n = 10000;
tic;
for i = 1:n
substrs_translated = regexptranslate('escape', substrs);
matches = false(size(strings));
for k = 1:numel(strings)
matches(k) = any(~cellfun('isempty', regexp(strings{k}, substrs_translated)));
end
end
toc
tic;
for i = 1:n
cellfun(@(s)any(~cellfun('isempty', regexp(s, substrs))), strings);
end
toc
tic;
for i = 1:n
pattern = ['(', strjoin(regexptranslate('escape', substrs), '|'), ')'];
output = ~cellfun('isempty', regexp(strings, pattern)); %#ok<NASGU>
end
toc
tic;
for i = 1:n
contains(strings,substrs);
end
toc
%Imagine you were using strings for all your text!
strings = string(strings);
tic;
for i = 1:n
contains(strings,substrs);
end
toc
end
Timing results:
>> profFunc
Elapsed time is 0.643176 seconds.
Elapsed time is 1.007309 seconds.
Elapsed time is 0.683643 seconds.
Elapsed time is 0.050663 seconds.
Elapsed time is 0.008177 seconds.
Upvotes: 1
Reputation: 65430
The issue is that with both strfind
and regexp
is that you can't provide two cell arrays and have them automatically apply all patterns to all strings. You will need to loop through one or the other to make it work.
You can do this with an explicit loop
strings = {'ab', 'bc', 'de', 'fa'};
substrs = {'a', 'b', 'c'};
% First you'll want to escape the regular expressions
substrs = regexptranslate('escape', substrs);
matches = false(size(strings));
for k = 1:numel(strings)
matches(k) = any(~cellfun('isempty', regexp(strings{k}, substrs)));
end
% 1 1 0 1
Or if you are for loop-averse you can use cellfun
cellfun(@(s)any(~cellfun('isempty', regexp(s, substrs))), strings)
% 1 1 0 1
Alternately, you could combine your sub-strings into a single regular expression
pattern = ['(', strjoin(regexptranslate('escape', substrs), '|'), ')'];
% (a|b|c)
output = ~cellfun('isempty', regexp(strings, pattern));
% 1 1 0 1
Upvotes: 3