AnnaSchumann
AnnaSchumann

Reputation: 1271

Finding strings using an index - MATLAB

I have a character array list and wish to tally the number of substring occurrences against an index held in a numerical vector chr:

list =
CCNNCCCNNNCNNCN

chr =

     1
     1
     1
     1
     2
     2
     2
     2
     2
     2
     2
     2
     2
     2
     2

Ordinarily, I am searching for adjacent string pairs i.e. 'NN' and utilise this method:

Count(:,1) = accumarray(chr(intersect([strfind(list,'CC')],find(~diff(chr)))),1);

Using ~diff(chr) to ensure the pattern matching does not cross index boundaries.

However, now I want to match single letter strings i.e. 'N' - how can I accomplish this? The above method means the last letter in each index is missed and not counted.

The desired result for the above example would be a two column matrix detailing the number of 'C's and 'N's within each index:

C     N
2     2
5     6

i.e. there are 2C's and 2N's within index '1' (stored in chr) - the count then restarts from 0 for the next '2' - where there are 5C's and 6N's.

Upvotes: 0

Views: 61

Answers (1)

Luis Mendo
Luis Mendo

Reputation: 112669

[u, ~, v] = unique(list);          %// get unique labels for list in variable v
result = full(sparse(chr, v, 1));  %// accumulate combinations of chr and v

This works for an arbitrary number of letters in list, an arbitrary number of indices in chr, and chr not necessarily sorted.

In your example

list = 'CCNNCCCNNNCNNCN';
chr = [1 1 1 1 2 2 2 2 2 2 2 2 2 2 2].';

which produces

result =
     2     2
     5     6

The letter associated with each column of result is given by u:

u =
CN

Upvotes: 3

Related Questions