alexnewby
alexnewby

Reputation: 61

Count the number of sentences in a paragraph using Ruby

I have gotten to the point where I can split and count sentences with simple end of sentence punctuation like ! ? .

However, I need it to work for complex sentences such as:

"Learning Ruby is a great endeavor!!!! Well, it can be difficult at times..."

Here you can see the punctuation repeats itself.

What I have so far, that works with simple sentences:

def count_sentences
  sentence_array = self.split(/[.?!]/)
  return sentence_array.count
end

Thank you!

Upvotes: 2

Views: 1586

Answers (3)

Cary Swoveland
Cary Swoveland

Reputation: 110675

class String
  def count_sentences
    scan(/[.!?]+(?=\s|\z)/).size
  end
end

str = "Learning Ruby is great!!!! The course cost $2.43... How much??!"

str.count_sentences
  #=> 3

(?=\s|\z)/) is a positive lookahead, requiring the match to be immediately followed by a whitespace character or the end of the string.

Upvotes: 3

user1934428
user1934428

Reputation: 22225

String#count might be easiest.

"Who will treat me to a beer? I bet, alexnewby will!".count('.!?')

Compared to tadman's solution, no intermediate array needs to be constructed. However it yields incorrect results if, for instance, a run of periods or exclamation mark is found in the string:

"Now thinking .... Ah, that's it! This is what we have to do!!!".count('.!?')

=> 8

The question therefore is: Do you need absolute, exact results, or just approximate ones (which might be sufficient, if this is used for statistical analysis of, say, large printed texts)? If you need exact results, you need to define, what is a sentence, and what is not. Think about the following text - how many sentences are in it?

 Louise jumped out of the ground floor window. 
 "Stop! Don't run away!", cried Andy. "I did not 
 want to eat your chocolate; you have to believe
 me!" - and, after thinking for a moment, he 
 added: "If you come back, I'll buy you a new
 one! Large one! With hazelnuts!".

BTW, even tadman's solution is not exact. It would give a count of five for the following single sentence:

The IP address of Mr. Sloopsteen's dishwasher is 192.168.101.108!

Upvotes: 1

tadman
tadman

Reputation: 211570

It's pretty easy to adapt your code to be a little more forgiving:

def count_sentences
  self.split(/[.?!]+/).count
end

There's no need for the intermediate variable or return.

Note that empty strings will also be caught up in this, so you may want to filter those out:

test = "This is junk! There's a space at the end! "

That would return 3 with your code. Here's a fix for that:

def count_sentences
  self.split(/[.?!]+/).grep(/\S/).count
end

That will select only those strings that have at least one non-space character.

Upvotes: 3

Related Questions