Reputation: 101
I am trying to use the following code to count the number of the whole word "the" in a file. It keeps returning zero for the number of "the". How would I make this work?
totalthe=length(regexp(strcat(lines{:}),'\bthe\b'))
Upvotes: 1
Views: 1055
Reputation: 21561
Here we go, based on the other answers, comments and some trial and error:
Suppose these are your lines:
lines = {'In the cell on the island'; 'there is the man.';'The end'}
Then this will count the occurance of 'the', case insensitive:
x = regexpi(lines,'\<the\>')
numel([x{:}])
Upvotes: 0
Reputation: 1137
Summarizing all comments:
totalthe=length(regexpi(strvcat(lines{:}),'\<the\>'))
strvcat
instead of strcat
to prevent a leading The
will not be stuck to a word at end of previous line.
Upvotes: 0
Reputation: 4768
Sorry, turns out I may have led you astray in a previous answer. Turns out the word boundaries for MATLAB are \<
and \>
(for the start and ending word boundaries respectively) instead of \b
. I learnt something new today too.
Note that this is preferable to using \s
(whitespace), as otherwise you might miss matches at the start and end of the line.
Upvotes: 1