calib_san
calib_san

Reputation: 101

Counting and displaying sum of occurrences

Part of my data (cell array of strings) is shown below. I want to count the occurrences of particular strings (e.g., 'P0702', 'P0882', etc.) and display the sum of the occurrences in the form of the output shown below:

'1FA'   '2012'  'F' ''  ''  ''  ''  ''  'P0702' 'P0882' 
'1Fc'   '2012'  'r' ''  ''  ''  ''  ''  'P0702' ''  ''  ''  
'1FA'   '2012'  'f' ''  ''  ''  ''  ''  'P0702' 'P0882' ''  
'1FA'   '2012'  'y' ''  ''  ''  'P0702' ''  ''  ''  ''  ''  
'1FA'   '2012'  'g' ''  ''  ''  ''  ''  ''  ''  ''  ''  ''  
'1FA'   '2012'  'u' ''  'P0702' 'P0882' ''  ''  ''  ''  ''  
'1FA'   '2012'  'y' ''  'P0702' ''  ''  ''  ''  ''  ''  ''  
'1FA'   '2012'  'n' ''  'P0702' ''  ''  ''  ''  ''  ''  ''  
'1FA'   '2012'  'j' ''  ''  ''  ''  ''  ''  ''  ''  'P0702'                                
'1FA'   '2012'  'u' 'P0702' ''  ''  ''  ''  ''  ''  ''  ''  
'1FM'   '2013'  'x' ''  ''  ''  ''  ''  'P1921' ''  ''  ''
'1FM'   '2013'  'c' ''  'P1711' ''  ''  ''  ''  ''  ''  ''
'1FM'   '2013'  'c' ''  ''  ''  ''  ''  'P0702' 'P0882' ''
'1FM'   '2009'  'E' ''  ''  ''  ''  ''  ''  ''  'P0500' 

Output:

        sum of counts above      
P0702   15
P0500    1
P1711    1

and so on.

I tried using sum(strcmp(d,{'P0882'}),2); which tells me how many times 'P0882' occurs, but it would be difficult to use it for every data string.

Upvotes: 3

Views: 111

Answers (3)

Luis Mendo
Luis Mendo

Reputation: 112659

You can count ocurrences of all strings without loops. Let C be your cell array.

[uniqueStrings, ~, v] = unique(C);
counts = histc(v, 1:max(v));
result = [uniqueStrings(:) num2cell(counts(:))];

In your example, this gives

result = 
    ''         [81]
    '1FA'      [ 9]
    '1FM'      [ 4]
    '1Fc'      [ 1]
    '2009'     [ 1]
    '2012'     [10]
    '2013'     [ 3]
    'E'        [ 1]
    'F'        [ 1]
    'P0500'    [ 1]
    'P0702'    [10]
    'P0882'    [ 4]
    'P1711'    [ 1]
    'P1921'    [ 1]
    'c'        [ 2]
    'f'        [ 1]
    'g'        [ 1]
    'j'        [ 1]
    'n'        [ 1]
    'r'        [ 1]
    'u'        [ 2]
    'x'        [ 1]
    'y'        [ 2]

Upvotes: 1

Robert Seifert
Robert Seifert

Reputation: 25232

If you have the Statistics Toolbox you can simply use tabulate

%// get only relevant part
X = data(:,4:end);

%// tabulate
tabulate(X(:))

It already gives a nicely formatted output:

  Value    Count   Percent
  P0702       10     58.82%
  P1711        1      5.88%
  P0882        4     23.53%
  P1921        1      5.88%
  P0500        1      5.88%

Alternatively with standard functions:

X = data(:,4:end)
[a,~,x] = unique(X(~strcmp(X,'')))
occ = hist(x(:),1:numel(a))
out = [a num2cell(occ).']

Upvotes: 2

Benoit_11
Benoit_11

Reputation: 13945

You could do as follows, basically apply strcmp as you proposed but in a loop in which you pre-determined the unique strings/data names to count.

I modified a bit the data you provided so that dimensions fit. The code is commented and pretty easy to follow:

C = {'1FA'   '2012'  'F' ''  ''  ''  ''  ''  'P0702' 'P0882' ;
'1Fc'   '2012'  'r' ''  ''  ''  ''  ''  'P0702' '';
'1FA'   '2012'  'f' ''  ''  ''  ''  ''  'P0702' 'P0882';
'1FA'   '2012'  'y' ''  ''  ''  'P0702' ''  ''  '';
'1FA'   '2012'  'g' ''  ''  ''  ''  ''  ''  '';
'1FA'   '2012'  'u' ''  'P0702' 'P0882' ''  ''  ''  ''  ;
'1FA'   '2012'  'y' ''  'P0702' ''  ''  ''  ''  '' ;
'1FA'   '2012'  'n' ''  'P0702' ''  ''  ''  ''  '' ;
'1FA'   '2012'  'j' ''  ''  ''  ''  ''  ''  'P0702' ;  
'1FA'   '2012'  'u' 'P0702' ''  ''  ''  ''  '' '' ;
'1FM'   '2013'  'x' ''  ''  ''  ''  ''  'P1921' '';
'1FM'   '2013'  'c' ''  'P1711' ''  ''  ''  ''  '';
'1FM'   '2013'  'c' ''  ''  ''  ''  ''  'P0702' 'P0882';
'1FM'   '2009'  'E' ''  ''  ''  ''  ''  '' 'P0500'}

%// Find unique strings to count occurence of.
[strings,~,~] = unique(C(:,4:end));

%// Remove empty cells automatically.
strings = strings(~cellfun(@isempty,strings));

%// Initialize output cell array
Output = cell(numel(strings),2);

%// Count occurence. You can combine the 2 lines into one using concatenation.
for k = 1:numel(strings)

    Output{k,1} = strings{k};    
    Output{k,2} = sum(sum(strcmp(C(:,4:end),strings{k})));

end

Let's make a nice table out of this:

T = table(Output(:,2),'RowNames',Output(:,1),'VariableNames',{'TotalOccurences'})

Output:

T = 

             TotalOccurences
             _______________

    P0500    [ 1]           
    P0702    [10]           
    P0882    [ 4]           
    P1711    [ 1]           
    P1921    [ 1]

If you don't have access to the table function, you can create a cell array with headers and change a bit the loop:

%// Initialize output cell array
Output = cell(numel(strings)+1,2);

%// Count occurence
for k = 1:numel(strings)

    Output{k+1,1} = strings{k};    
    Output{k+1,2} = sum(sum(strcmp(C(:,4:end),strings{k})));

end
%T = table(Output(:,2),'RowNames',Output(:,1),'VariableNames',{'TotalOccurences'})

Output(1,:) = {'Data' 'Occurence'}

Output:

Output = 

    'Data'     'Occurence'
    'P0500'    [        1]
    'P0702'    [       10]
    'P0882'    [        4]
    'P1711'    [        1]
    'P1921'    [        1]

Upvotes: 2

Related Questions