Reputation: 1271
I have a character array list and wish to tally the number of substring occurrences against an index held in a numerical vector chr:
list =
CCNNCCCNNNCNNCN
chr =
1
1
1
1
2
2
2
2
2
2
2
2
2
2
2
Ordinarily, I am searching for adjacent string pairs i.e. 'NN' and utilise this method:
Count(:,1) = accumarray(chr(intersect([strfind(list,'CC')],find(~diff(chr)))),1);
Using ~diff(chr) to ensure the pattern matching does not cross index boundaries.
However, now I want to match single letter strings i.e. 'N' - how can I accomplish this? The above method means the last letter in each index is missed and not counted.
The desired result for the above example would be a two column matrix detailing the number of 'C's and 'N's within each index:
C N
2 2
5 6
i.e. there are 2C's and 2N's within index '1' (stored in chr
) - the count then restarts from 0 for the next '2' - where there are 5C's and 6N's.
Upvotes: 0
Views: 61
Reputation: 112669
[u, ~, v] = unique(list); %// get unique labels for list in variable v
result = full(sparse(chr, v, 1)); %// accumulate combinations of chr and v
This works for an arbitrary number of letters in list
, an arbitrary number of indices in chr
, and chr
not necessarily sorted.
In your example
list = 'CCNNCCCNNNCNNCN';
chr = [1 1 1 1 2 2 2 2 2 2 2 2 2 2 2].';
which produces
result =
2 2
5 6
The letter associated with each column of result
is given by u
:
u =
CN
Upvotes: 3