HikeTakerByRequest
HikeTakerByRequest

Reputation: 333

Finding string in cell array of cell arrays

Using Matlab, say that we have a cell array of cell arrays. For example:

C = { {'hello' 'there' 'friend'}, {'do' 'say' 'hello'}, {'or' 'maybe' 'not'} }

I would like to find the index of all of the cell arrays in C that contain the string 'hello'. In this case, I would expect 1 and 2, because the 1st cell array has 'hello' in the first slot and the 2nd cell array has it in the third slot.

This would be quite a bit easier I imagine using a matrix (a simple find) but for educational purposes, I'd like to learn the process using a cell array of cell arrays as well.

Many thanks in advance.

Upvotes: 2

Views: 1614

Answers (3)

Divakar
Divakar

Reputation: 221564

Straight-forward Approaches

With arrayfun -

out = find(arrayfun(@(n) any(strcmp(C{n},'hello')),1:numel(C)))

With cellfun -

out = find(cellfun(@(x) any(strcmp(x,'hello')),C))

Alternative Approach

You can adopt a new approach that translates the input of cell array of cell arrays of strings to cell array of strings, thus reducing one level "cell hierarchy". Then, it performs strcmp and thus avoids cellfun or arrayfun, which might make it faster than earlier listed approaches. Please note that this approach would make more sense from performance point of view, if the number of cells in each cell of the input cell array don't vary a lot, since that translation leads to a 2D cell array with empty cells filling up empty places.

Here's the implementation -

%// Convert cell array of cell ararys to a cell array of strings, i.e.
%// remove one level of "cell hierarchy"
lens = cellfun('length',C)
max_lens = max(lens) 
C1 = cell(max_lens,numel(C))
C1(bsxfun(@le,[1:max_lens]',lens)) = [C{:}]  %//'

%// Use strsmp without cellfun and this might speed it up
out = find(any(strcmp(C1,'hello'),1))

Explanation:

[1] Convert cell array of cell arrays of strings to cell array of strings:

C = { {'hello' 'there' 'friend'}, {'do' 'hello'}, {'or' 'maybe' 'not'} }

gets converted to

C1 = {
    'hello'     'do'       'or'   
    'there'     'hello'    'maybe'
    'friend'         []    'not'  }

[2] For each column find if there's any string hello and find those column IDs as the final output.

Upvotes: 5

Luis Mendo
Luis Mendo

Reputation: 112679

Assuming the inner cell arrays are horizontal and equal-sized (as in your example), and that you want to find exact matches of the string:

result = find(any(strcmp(vertcat(C{:}),'hello'), 2));

This works as follows:

  1. Convert your cell array of cell arrays of strings C into a 2D cell array of strings: vertcat(C{:})
  2. Compare each string with the sought string ('hello'): strcmp(...,'hello')
  3. Find indices of rows in which a match was found: find(any(..., 2))

Upvotes: 0

Benoit_11
Benoit_11

Reputation: 13945

Here is a way using regular expressions, which I think is far less efficient than @Divakar's strcmp solution, but that could be informative anyway.

regexp operates on cell arrays, but since C is a cell array of cell arrays, we need to use cellfun to get a logical cell array of cell arrays, after which we use cellfun once more to fetch the indices of matches. Actually I might be using unnecessary steps but I figured it was more intuitive that way

Code:

clear
clc

C = { {'hello' 'there' 'friend'}, {'do' 'say' 'hello'}, {'or' 'maybe' 'not'} }

CheckWord = cellfun(@(x) regexp(x,'hello'),C,'uni',false);

Here CheckWord is a cell array of cell arrays containing either 0 or 1 depending on the matches with the string hello:

CheckWord = 

    {1x3 cell}    {1x3 cell}    {1x3 cell}

To make things a bit clearer, let's reshape CheckWord:

CheckWord = reshape([CheckWord{:}],numel(C),[]).'

CheckWord = 

    [1]    []     []
     []    []    [1]
     []    []     []

Since CheckWord is a cell array, we can use cellfun and find to look for non-empty cells, i.e. those corresponding to matches:

[row col] = find(~cellfun('isempty',CheckWord))

row =

     1
     2

col =

     1
     3

Therefore the cells containing the word "hello" are the 1st and 2nd.

Hope that helps!

Upvotes: 5

Related Questions