Reputation: 1271
I have a character array (this can also be stored as a cell array if more useful) (list
) and wish to tally the number of substring occurrences against two different indexes held in two separate variables type
and ind
.
list =
C C N N C U C N N N C N U N C N C
ind =
1 1 2 2 2 3 3 3 4 1 1 2 3 3 3 4 4
type =
15 15 15 15 15 15 15 15 15 16 16 16 16 16 16 16 16
No spaces exist in the character array - added for clarity.
Using the above example, the desired output would tally all instances of unique letters in list
, for each ind
and for each type
- creating three columns (for C/N/U), each with 4 rows (for each ind) - per type. This is done using the order in which the entries in each array appear.
Desired output of above example (the labels are added for clarity only):
Type 15 Type 16
Ind C N U C N U
1 2 0 0 1 1 0
2 1 2 0 0 1 0
3 1 1 1 1 1 1
4 0 1 0 1 1 0
I am only aware of how to do this with a single index (using unique
, full
and sparse
).
How can I bet go about doing this with a dual index?
Upvotes: 2
Views: 55
Reputation: 25232
One possibility could be to transform your letters to doubles by substracting e.g. -64
to map the number 3 to the letter C.
Then you can use unique
with 'rows'
and 'stable'
, to get the following result:
list = char('CCNNCUCNNNCNUNCNC')
ind = [1 1 2 2 2 3 3 3 4 1 1 2 3 3 3 4 4]
type = [15 15 15 15 15 15 15 15 15 16 16 16 16 16 16 16 16]
data = [type(:) ind(:) (list(:) - 64)]
[a,~,c] = unique(data,'rows','stable')
occ = accumarray(c,ones(size(c)),[],@numel)
output = [a, occ]
output =
15 1 3 2
15 2 14 2
15 2 3 1
15 3 21 1
15 3 3 1
15 3 14 1
15 4 14 1
16 1 14 1
16 1 3 1
16 2 14 1
16 3 21 1
16 3 14 1
16 3 3 1
16 4 14 1
16 4 3 1
If you have the Statistics Toolbox you should consider using grpstats
.
If you don't mind a mind twisting output then crosstab
is the far easiest solution:
output = crosstab(type(:),ind(:),list(:)-64)
%// type in downwards, ind to the right
output(:,:,1) = %// 'C'
2 1 1 0
1 0 1 1
output(:,:,2) = %// 'N'
0 2 1 1
1 1 1 1
output(:,:,3) = %// 'U'
0 0 1 0
0 0 1 0
The following one liner looks close like your desired output:
output2 = reshape(crosstab(ind(:),list(:)-64,type(:)),4,[],1)
output2 =
2 0 0 1 1 0
1 2 0 0 1 0
1 1 1 1 1 1
0 1 0 1 1 0
Also in this toolbox, you can find the tabulate
function which offers another option in combination with accumarray
:
[~,~,c] = unique([type(:) ind(:)],'rows','stable')
output = accumarray(c(:),list(:),[],@(x) {tabulate(x)} )
Which also allows the following output:
d = unique([type(:) ind(:) list(:)-64],'rows','stable')
output2 = [num2cell(d(:,[1,2])) vertcat(output{:})]
output2 =
[15] [1] 'C' [2] [ 100]
[15] [2] 'N' [2] [66.6667]
[15] [2] 'C' [1] [33.3333]
[15] [3] 'U' [1] [33.3333]
[15] [3] 'C' [1] [33.3333]
[15] [3] 'N' [1] [33.3333]
[15] [4] 'N' [1] [ 100]
[16] [1] 'N' [1] [ 50]
[16] [1] 'C' [1] [ 50]
[16] [2] 'N' [1] [ 100]
[16] [3] 'U' [1] [33.3333]
[16] [3] 'N' [1] [33.3333]
[16] [3] 'C' [1] [33.3333]
[16] [4] 'N' [1] [ 50]
[16] [4] 'C' [1] [ 50]
Upvotes: 3
Reputation: 18177
Use accumarray
:
Output = accumarray([type',ind'],list');
Could be you need to convert type
and list
to numbers first using str2num
and then use accumarray
and transform the result back to numbers using num2str
.
Upvotes: 0