Reputation: 119
For an assignment I need to find the number of sentences of a text file (not the lines). That means at the end of string I will have '.' or '!' or '?'. After struggling a lot I wrote a code, which is giving an error. I do not see any mistake though. If anyone can help me, that will be highly appreciated. Thanks
Here is my code
fh1 = fopen(nameEssay); %nameEssay is a string of the name of the file with .txt
line1 = fgetl(fh1);
% line1 gives the title of the essay. that is not counted as a sentence
essay = [];
line = ' ';
while ischar(line)
line =fgetl(fh1);
essay = [essay line];
%creates a long string of the whole essay
end
sentenceCount=0;
allScore = [ ];
[sentence essay] = strtok(essay, '.?!');
while ~isempty(sentence)
sentenceCount = sentenceCount + 1;
sentence = [sentence essay(1)];
essay= essay(3:end); %(1st character is a punctuation. 2nd is a space.)
while ~isempty(essay)
[sentence essay] = strtok(essay, '.?!');
end
end
fclose(fh1);
Upvotes: 0
Views: 126
Reputation: 30589
regexp
handles this nicely:
>> essay = 'First sentence. Second one? Third! Last one.'
essay =
First sentence. Second one? Third! Last one.
>> sentences = regexp(essay,'\S.*?[\.\!\?]','match')
sentences =
'First sentence.' 'Second one?' 'Third!' 'Last one.'
In the pattern '\S.*?[\.\!\?]'
, the \S
says a sentence starts with a non-whitespace character, the .*?
matches any number of characters (non-greedily), until a punctuation marking the end of a sentence ([\.\!\?]
) is encountered.
Upvotes: 3
Reputation: 238497
If you count number of senteces, based on '.' or '!' or '?', you can just calculate the number of these characters in essey. Thus, if essay is array containing characters you can do:
essay = 'Some sentece. Sentec 2! Sentece 3? Sentece 4.';
% count number of '.' or '!' or '?' in essey.
sum(essay == abs('.'))
sum(essay == abs('?'))
sum(essay == abs('!'))
% gives, 2, 1, 1. Thus there are 4 sentences in the example.
If you want senteces, you can use strsplit as Dan suggested, e.g.
[C, matches] = strsplit(essay,{'.','?', '!'}, 'CollapseDelimiters',true)
% gives
C =
'Some sentece' ' Sentec 2' ' Sentece 3' ' Sentece 4' ''
matches =
'.' '!' '?' '.'
And calculate the number of elements in matches. For the example last element is empty. It can be filtered out easly.
Upvotes: 3