Reputation: 101
Part of my data (cell array of strings) is shown below. I want to count the occurrences of particular strings (e.g., 'P0702'
, 'P0882'
, etc.) and display the sum of the occurrences in the form of the output shown below:
'1FA' '2012' 'F' '' '' '' '' '' 'P0702' 'P0882'
'1Fc' '2012' 'r' '' '' '' '' '' 'P0702' '' '' ''
'1FA' '2012' 'f' '' '' '' '' '' 'P0702' 'P0882' ''
'1FA' '2012' 'y' '' '' '' 'P0702' '' '' '' '' ''
'1FA' '2012' 'g' '' '' '' '' '' '' '' '' '' ''
'1FA' '2012' 'u' '' 'P0702' 'P0882' '' '' '' '' ''
'1FA' '2012' 'y' '' 'P0702' '' '' '' '' '' '' ''
'1FA' '2012' 'n' '' 'P0702' '' '' '' '' '' '' ''
'1FA' '2012' 'j' '' '' '' '' '' '' '' '' 'P0702'
'1FA' '2012' 'u' 'P0702' '' '' '' '' '' '' '' ''
'1FM' '2013' 'x' '' '' '' '' '' 'P1921' '' '' ''
'1FM' '2013' 'c' '' 'P1711' '' '' '' '' '' '' ''
'1FM' '2013' 'c' '' '' '' '' '' 'P0702' 'P0882' ''
'1FM' '2009' 'E' '' '' '' '' '' '' '' 'P0500'
Output:
sum of counts above
P0702 15
P0500 1
P1711 1
and so on.
I tried using sum(strcmp(d,{'P0882'}),2);
which tells me how many times 'P0882'
occurs, but it would be difficult to use it for every data string.
Upvotes: 3
Views: 111
Reputation: 112659
You can count ocurrences of all strings without loops. Let C
be your cell array.
[uniqueStrings, ~, v] = unique(C);
counts = histc(v, 1:max(v));
result = [uniqueStrings(:) num2cell(counts(:))];
In your example, this gives
result =
'' [81]
'1FA' [ 9]
'1FM' [ 4]
'1Fc' [ 1]
'2009' [ 1]
'2012' [10]
'2013' [ 3]
'E' [ 1]
'F' [ 1]
'P0500' [ 1]
'P0702' [10]
'P0882' [ 4]
'P1711' [ 1]
'P1921' [ 1]
'c' [ 2]
'f' [ 1]
'g' [ 1]
'j' [ 1]
'n' [ 1]
'r' [ 1]
'u' [ 2]
'x' [ 1]
'y' [ 2]
Upvotes: 1
Reputation: 25232
If you have the Statistics Toolbox you can simply use tabulate
%// get only relevant part
X = data(:,4:end);
%// tabulate
tabulate(X(:))
It already gives a nicely formatted output:
Value Count Percent
P0702 10 58.82%
P1711 1 5.88%
P0882 4 23.53%
P1921 1 5.88%
P0500 1 5.88%
Alternatively with standard functions:
X = data(:,4:end)
[a,~,x] = unique(X(~strcmp(X,'')))
occ = hist(x(:),1:numel(a))
out = [a num2cell(occ).']
Upvotes: 2
Reputation: 13945
You could do as follows, basically apply strcmp
as you proposed but in a loop in which you pre-determined the unique strings/data names to count.
I modified a bit the data you provided so that dimensions fit. The code is commented and pretty easy to follow:
C = {'1FA' '2012' 'F' '' '' '' '' '' 'P0702' 'P0882' ;
'1Fc' '2012' 'r' '' '' '' '' '' 'P0702' '';
'1FA' '2012' 'f' '' '' '' '' '' 'P0702' 'P0882';
'1FA' '2012' 'y' '' '' '' 'P0702' '' '' '';
'1FA' '2012' 'g' '' '' '' '' '' '' '';
'1FA' '2012' 'u' '' 'P0702' 'P0882' '' '' '' '' ;
'1FA' '2012' 'y' '' 'P0702' '' '' '' '' '' ;
'1FA' '2012' 'n' '' 'P0702' '' '' '' '' '' ;
'1FA' '2012' 'j' '' '' '' '' '' '' 'P0702' ;
'1FA' '2012' 'u' 'P0702' '' '' '' '' '' '' ;
'1FM' '2013' 'x' '' '' '' '' '' 'P1921' '';
'1FM' '2013' 'c' '' 'P1711' '' '' '' '' '';
'1FM' '2013' 'c' '' '' '' '' '' 'P0702' 'P0882';
'1FM' '2009' 'E' '' '' '' '' '' '' 'P0500'}
%// Find unique strings to count occurence of.
[strings,~,~] = unique(C(:,4:end));
%// Remove empty cells automatically.
strings = strings(~cellfun(@isempty,strings));
%// Initialize output cell array
Output = cell(numel(strings),2);
%// Count occurence. You can combine the 2 lines into one using concatenation.
for k = 1:numel(strings)
Output{k,1} = strings{k};
Output{k,2} = sum(sum(strcmp(C(:,4:end),strings{k})));
end
Let's make a nice table out of this:
T = table(Output(:,2),'RowNames',Output(:,1),'VariableNames',{'TotalOccurences'})
Output:
T =
TotalOccurences
_______________
P0500 [ 1]
P0702 [10]
P0882 [ 4]
P1711 [ 1]
P1921 [ 1]
If you don't have access to the table
function, you can create a cell array with headers and change a bit the loop:
%// Initialize output cell array
Output = cell(numel(strings)+1,2);
%// Count occurence
for k = 1:numel(strings)
Output{k+1,1} = strings{k};
Output{k+1,2} = sum(sum(strcmp(C(:,4:end),strings{k})));
end
%T = table(Output(:,2),'RowNames',Output(:,1),'VariableNames',{'TotalOccurences'})
Output(1,:) = {'Data' 'Occurence'}
Output:
Output =
'Data' 'Occurence'
'P0500' [ 1]
'P0702' [ 10]
'P0882' [ 4]
'P1711' [ 1]
'P1921' [ 1]
Upvotes: 2