Reputation:
I have to count how often a certain string is contained in a cell-array. The problem is the code is way to slow it takes almost 1 second in order to do this.
uniqueWordsSize = 6; % just a sample number
wordsCounter = zeros(uniqueWordsSize, 1);
uniqueWords = unique(words); % words is a cell-array
for i = 1:uniqueWordsSize
wordsCounter(i) = sum(strcmp(uniqueWords(i), words));
end
What I'm currently doing is to compare every word in uniqueWords with the cell-array words and use sum in order to calculate the sum of the array which gets returned by strcmp.
I hope someone can help me to optimize that.... 1 second for 6 words is just too much.
EDIT: ismember is even slower.
Upvotes: 1
Views: 1142
Reputation: 11
tricky way without using explicit fors..
clc
close all
clear all
Paragraph=lower(fileread('Temp1.txt'));
AlphabetFlag=Paragraph>=97 & Paragraph<=122; % finding alphabets
DelimFlag=find(AlphabetFlag==0); % considering non-alphabets delimiters
WordLength=[DelimFlag(1), diff(DelimFlag)];
Paragraph(DelimFlag)=[]; % setting delimiters to white space
Words=mat2cell(Paragraph, 1, WordLength-1); % cut the paragraph into words
[SortWords, Ia, Ic]=unique(Words); %finding unique words and their subscript
Bincounts = histc(Ic,1:size(Ia, 1));%finding their occurence
[SortBincounts, IndBincounts]=sort(Bincounts, 'descend');% finding their frequency
FreqWords=SortWords(IndBincounts); % sorting words according to their frequency
FreqWords(1)=[];SortBincounts(1)=[]; % dealing with remaining white space
Freq=SortBincounts/sum(SortBincounts)*100; % frequency percentage
%% plot
NMostCommon=20;
disp(Freq(1:NMostCommon))
pie([Freq(1:NMostCommon); 100-sum(Freq(1:NMostCommon))], [FreqWords(1:NMostCommon), {'other words'}]);
Upvotes: 0
Reputation: 74940
You can drop the loop completely by using the third output of unique
together with hist
:
words = {'a','b','c','a','a','c'}
[uniqueWords,~,wordOccurrenceIdx]=unique(words)
nUniqueWords = length(uniqueWords);
counts = hist(wordOccurrenceIdx,1:nUniqueWords)
uniqueWords =
'a' 'b' 'c'
wordOccurrenceIdx =
1 2 3 1 1 3
counts =
3 1 2
Upvotes: 3