Matlab. Find the indices of a cell array of strings with characters all contained in a given string (without repetition)

Question

I have one string and a cell array of strings.

str = 'actaz';
dic = {'aaccttzz', 'ac', 'zt', 'ctu', 'bdu', 'zac', 'zaz', 'aac'};

I want to obtain:

idx = [2, 3, 6, 8];

I have written a very long code that:

finds the elements with length not greater than length(str);
removes the elements with characters not included in str;
finally, for each remaining element, checks the characters one by one

Essentially, it's an almost brute force code and runs very slowly. I wonder if there is a simple way to do it fast.

NB: I have just edited the question to make clear that characters can be repeated n times if they appear n times in str. Thanks Shai for pointing it out.

Mohsen Nosratinia · Accepted Answer

You can sort the strings and then match them using regular expression. For your example the pattern will be ^a{0,2}c{0,1}t{0,1}z{0,1}$:

u = unique(str);
t = ['^' sprintf('%c{0,%d}', [u; histc(str,u)]) '$']; 
s = cellfun(@sort, dic, 'uni', 0);
idx = find(~cellfun('isempty', regexp(s, t)));

Matlab. Find the indices of a cell array of strings with characters all contained in a given string (without repetition)

Answers (2)

Related Questions