Reputation: 51
Hi i want to count number of sentences in a string so far i am using this:
int count = str.split("[!?.:]+").length;
But my string includes "." in names and in between words also for example
"He name is Walton D.C. and he just completed his B.Tech last year."
Now using above line as example count will return 4 sentences but there is only one.
So how to deal with these situations?
Upvotes: 3
Views: 5928
Reputation: 29
easy Way to do
public class CountLines {
public static void main(String[] args) {
// TODO Auto-generated method stub
String s="Find the number Sentence";
int count=0;
for (int i = 0; i < s.length(); i++) {
if(s.charAt(i)==' ') {
count++;
}
}
count=count+1;
System.out.println(count);
}
}
Upvotes: 1
Reputation: 187
The solution can be that in case of dots, you can check if u have a space and a capital letter after it.
"[dot][space][capital letter]"
That will be an assurance for the sentence for sure
Updating the code for the same:
public static void main( String args[] ) {
// String to be scanned to find the pattern.
String line = "This order was placed for QT3000! MK? \n Thats amazing. \n But I am not sure.";
String pattern = "([.!?])([\\s\\n])([A-Z]*)";
// Create a Pattern object
Pattern r = Pattern.compile(pattern);
// Now create matcher object.
Matcher m = r.matcher(line);
int count=0;
while (m.find( )) {
count++;
}
count++; //for the last line, which will not get included here.
System.out.println("COUNT=="+count);
}
Upvotes: 1
Reputation: 9331
You can use BreakIterator, and detect different kinds of text boundaries
In your case Sentences:
private static void markBoundaries(String target, BreakIterator iterator) {
StringBuffer markers = new StringBuffer();
markers.setLength(target.length() + 1);
for (int k = 0; k < markers.length(); k++) {
markers.setCharAt(k, ' ');
}
int count = 0;
iterator.setText(target);
int boundary = iterator.first();
while (boundary != BreakIterator.DONE) {
markers.setCharAt(boundary, '^');
++count;
boundary = iterator.next();
}
System.out.println(target);
System.out.println(markers);
System.out.println("Number of Boundaries: " + count);
System.out.println("Number of Sentences: " + (count-1));
}
public static void main(String[] args) {
Locale currentLocale = new Locale("en", "US");
BreakIterator sentenceIterator
= BreakIterator.getSentenceInstance(currentLocale);
String someText = "He name is Walton D.C. and he just completed his B.Tech last year.";
markBoundaries(someText, sentenceIterator);
someText = "This order was placed for QT3000! MK?";
markBoundaries(someText, sentenceIterator);
}
The output will be:
He name is Walton D.C. and he just completed his B.Tech last year.
^ ^
Number of Boundaries: 2
Number of Sentences: 1
This order was placed for QT3000! MK?
^ ^ ^
Number of Boundaries: 3
Number of Sentences: 2
Upvotes: 3
Reputation: 333
One solution cane be to skip dots if you have one or more UPERCASE letters before it. In this case names(if they are in upercase). Implementing this you will have only one sentence.
Another solution: improving one answer here could be: [lowercase]([dot] or [?] or [!])[space][uppercase]
But like i said, if there are no exact rules, it will be almost impossible.
Upvotes: 0