Archie
Archie

Reputation: 5421

BreakIterator doesn't find correct sentence boundary with parenthesized "i.e." or "e.g."

In the example below, BreakIterator appears to be failing on a fairly straightforward example.

Am I using BreakIterator incorrectly, or is this just a bug?

Example class:

import java.text.BreakIterator;
import java.util.Locale;
public class BreakIteratorTest {
    public static void main(String[] args) throws Exception {
        String text = "Due to a problem (e.g., software bug), the server is down.";
        BreakIterator bi = BreakIterator.getSentenceInstance(Locale.US);
        bi.setText(text);
        int r = bi.preceding(30);
        System.out.println("bi.preceding(30) returned " + r);
        String sentence = r == BreakIterator.DONE ? text : text.substring(0, r);
        System.out.println("first sentence: \"" + sentence + "\"");
    }
}

Output:

$ javac BreakIteratorTest.java 
$ java BreakIteratorTest
bi.preceding(30) returned 21
first sentence: "Due to a problem (e.g"

It seems like bi.preceding(30) should have returned BreakIterator.DONE instead.

JDK version 1.8.0.

Upvotes: 4

Views: 200

Answers (0)

Related Questions