user3226108
user3226108

Reputation: 119

Counting sentences *not* lines of a text file

For an assignment I need to find the number of sentences of a text file (not the lines). That means at the end of string I will have '.' or '!' or '?'. After struggling a lot I wrote a code, which is giving an error. I do not see any mistake though. If anyone can help me, that will be highly appreciated. Thanks

Here is my code

fh1 = fopen(nameEssay); %nameEssay is a string of the name of the file with .txt
line1 = fgetl(fh1); 

% line1 gives the title of the essay. that is not counted as a sentence

essay = [];
line = ' ';
while ischar(line)
    line =fgetl(fh1);
    essay = [essay line];
    %creates a long string of the whole essay
end

sentenceCount=0;
allScore = [ ];



[sentence essay] = strtok(essay, '.?!');
 while ~isempty(sentence)
    sentenceCount = sentenceCount + 1;
    sentence = [sentence essay(1)];

    essay= essay(3:end); %(1st character is a punctuation. 2nd is a space.)
    while ~isempty(essay)
        [sentence essay] = strtok(essay, '.?!');
    end

end
fclose(fh1);

Upvotes: 0

Views: 126

Answers (2)

chappjc
chappjc

Reputation: 30589

regexp handles this nicely:

>> essay = 'First sentence. Second one? Third! Last one.'
essay =
First sentence. Second one? Third! Last one.
>> sentences = regexp(essay,'\S.*?[\.\!\?]','match')
sentences = 
    'First sentence.'    'Second one?'    'Third!'    'Last one.'

In the pattern '\S.*?[\.\!\?]', the \S says a sentence starts with a non-whitespace character, the .*? matches any number of characters (non-greedily), until a punctuation marking the end of a sentence ([\.\!\?]) is encountered.

Upvotes: 3

Marcin
Marcin

Reputation: 238497

If you count number of senteces, based on '.' or '!' or '?', you can just calculate the number of these characters in essey. Thus, if essay is array containing characters you can do:

essay = 'Some sentece. Sentec 2! Sentece 3? Sentece 4.';


% count number of  '.' or '!' or '?' in essey.
sum(essay == abs('.')) 
sum(essay == abs('?'))
sum(essay == abs('!'))

% gives, 2, 1, 1. Thus there are 4 sentences in the example.

If you want senteces, you can use strsplit as Dan suggested, e.g.

[C, matches] = strsplit(essay,{'.','?', '!'}, 'CollapseDelimiters',true)

% gives
C = 

    'Some sentece'    ' Sentec 2'    ' Sentece 3'    ' Sentece 4'    ''


matches = 

    '.'    '!'    '?'    '.'

And calculate the number of elements in matches. For the example last element is empty. It can be filtered out easly.

Upvotes: 3

Related Questions