Sahil Shokeen
Sahil Shokeen

Reputation: 51

Count number of sentence in a java string

Hi i want to count number of sentences in a string so far i am using this:

int count = str.split("[!?.:]+").length;

But my string includes "." in names and in between words also for example

"He name is Walton D.C. and he just completed his B.Tech last year."

Now using above line as example count will return 4 sentences but there is only one.

So how to deal with these situations?

Upvotes: 3

Views: 5928

Answers (4)

Abubaker Siddiq
Abubaker Siddiq

Reputation: 29

easy Way to do

public class CountLines {

public static void main(String[] args) {
    // TODO Auto-generated method stub
    String s="Find the number Sentence";
    int count=0;
    for (int i = 0; i < s.length(); i++) {
        if(s.charAt(i)==' ') {
            count++;
        }
    }
    count=count+1;
    System.out.println(count);
}

}

Upvotes: 1

Yashi Srivastava
Yashi Srivastava

Reputation: 187

The solution can be that in case of dots, you can check if u have a space and a capital letter after it.

"[dot][space][capital letter]"

That will be an assurance for the sentence for sure

Updating the code for the same:

public static void main( String args[] ) {
      // String to be scanned to find the pattern.
      String line = "This order was placed for QT3000! MK? \n Thats amazing. \n But I am not sure.";
  String pattern = "([.!?])([\\s\\n])([A-Z]*)";

  // Create a Pattern object
  Pattern r = Pattern.compile(pattern);

  // Now create matcher object.
  Matcher m = r.matcher(line);
  int count=0;
  while (m.find( )) {
      count++;
  }
  count++; //for the last line, which will not get included here.
  System.out.println("COUNT=="+count);
}

Upvotes: 1

gtgaxiola
gtgaxiola

Reputation: 9331

You can use BreakIterator, and detect different kinds of text boundaries

In your case Sentences:

private static void markBoundaries(String target, BreakIterator iterator) {
    StringBuffer markers = new StringBuffer();
    markers.setLength(target.length() + 1);
    for (int k = 0; k < markers.length(); k++) {
        markers.setCharAt(k, ' ');
    }
    int count = 0;
    iterator.setText(target);
    int boundary = iterator.first();
    while (boundary != BreakIterator.DONE) {
        markers.setCharAt(boundary, '^');
        ++count;
        boundary = iterator.next();
    }
    System.out.println(target);
    System.out.println(markers);
    System.out.println("Number of Boundaries: " + count);
    System.out.println("Number of Sentences: " + (count-1));
}

public static void main(String[] args) {
    Locale currentLocale = new Locale("en", "US");
    BreakIterator sentenceIterator
            = BreakIterator.getSentenceInstance(currentLocale);
    String someText = "He name is Walton D.C. and he just completed his B.Tech last year.";
    markBoundaries(someText, sentenceIterator);
    someText = "This order was placed for QT3000! MK?";
    markBoundaries(someText, sentenceIterator);

}

The output will be:

He name is Walton D.C. and he just completed his B.Tech last year.
^                                                                 ^
Number of Boundaries: 2
Number of Sentences: 1
This order was placed for QT3000! MK?
^                                 ^  ^
Number of Boundaries: 3
Number of Sentences: 2

Upvotes: 3

Simion
Simion

Reputation: 333

One solution cane be to skip dots if you have one or more UPERCASE letters before it. In this case names(if they are in upercase). Implementing this you will have only one sentence.

Another solution: improving one answer here could be: [lowercase]([dot] or [?] or [!])[space][uppercase]

But like i said, if there are no exact rules, it will be almost impossible.

Upvotes: 0

Related Questions