wolve80
wolve80

Reputation: 105

print the first sentence of each paragraph in Java

I have a text file and wish to print the first sentence of each paragraph. Paragraphs are separated by a line break, i.e. "\n".

From the BreakIterator, I thought I could use the getLineInstance() for this but it seems it is iterator over each word:

public String[] extractFirstSentences() {
    BreakIterator boundary = BreakIterator.getLineInstance(Locale.US);
    boundary.setText(getText());

    List<String> sentences = new ArrayList<String>();
    int start = boundary.first();
    int end = boundary.next();
    while (end != BreakIterator.DONE) {
        String sentence = getText().substring(start, end).trim();
        if (!sentence.isEmpty()) {
            sentences.add(sentence);
        }
        start = end;
        end = boundary.next();
    }

    return sentences.toArray(new String[sentences.size()]);

Am I using getLineInstance() incorrectly or is there another method to do what I want?

Upvotes: 2

Views: 2729

Answers (1)

aroth
aroth

Reputation: 54796

How about this as an alternative:

public String[] extractFirstSentences() {
    String myText = getText();
    String[] paragraphs = myText.split("\\n");
    List<String> result = new ArrayList<String>();
    for (String paragraph : paragraphs) {
        result.add(paragraph.split("[\\.\\?\\!][\\r\\n\\t ]+")[0] + ".");
    }

    return result.toArray(new String[result.size()]);
}

Upvotes: 2

Related Questions